練習問題-2

勉強時間と定期試験の点数についてデータを作成する。

study_time <- c(1, 3, 10, 12, 6, 3, 8, 4, 1, 5)
point <- c(20, 40, 100, 80, 50, 50, 70, 50, 10, 60)

ピアソンの無相関検定を実行する。Rではcor.test()を利用する。この際にmethods=”pearson”を指定する。

> cor.test(study_time, point, method="pearson")

        Pearson's product-moment correlation

data:  study_time and point 
t = 6.1802, df = 8, p-value = 0.0002651
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.6542283 0.9786369 
sample estimates:
      cor 
0.9092974

帰無仮説は相関係数=0であり、２つの変数に相関はないとする。対立仮説は相関係数は0と等しくないとします。

計算されたt値は6.1802となり、自由度=8からp値は0.0002651です。有意水準を5%とすると帰無仮説は棄却されます。

サンプルから計算された相関係数は0.9092974となり、95%の信頼区間は0.654～0.979となります。

練習問題-4

データはランダムで作成する。

> food<-sample(c("洋食", "和食"), 20, replace=TRUE)
> food
 [1] "洋食" "洋食" "洋食" "洋食" "洋食" "洋食" "和食" "和食" "洋食" "和食"
[11] "和食" "和食" "和食" "和食" "和食" "洋食" "洋食" "洋食" "和食" "洋食"
> taste<-sample(c("甘党", "辛党"), 20, replace=TRUE)
> taste
 [1] "辛党" "辛党" "甘党" "甘党" "甘党" "甘党" "辛党" "辛党" "甘党" "辛党"
[11] "甘党" "辛党" "辛党" "辛党" "辛党" "甘党" "辛党" "甘党" "甘党" "辛党"

クロス集計表を作成する

> tasting_table <-  table(food, taste)
> tasting_table
      taste
food   甘党 辛党
  洋食    7    4
  和食    2    7

Χ検定を実施する

> chisq.test(tasting_table, correct=FALSE)

        Pearson's Chi-squared test

data:  tasting_table 
X-squared = 3.4303, df = 1, p-value = 0.06401

 警告メッセージ： 
In chisq.test(tasting_table, correct = FALSE) :
   カイ自乗近似は不正確かもしれません

帰無仮説では食事の好みとお酒の好みは独立である。

Χ＾2値は3.4303となる。自由度1からp値は0.064と求められた。p値は有意水準5%より大きいので帰無仮説は棄却できない。よって、食事の好みとお酒の好みが独立ではないとはいえない。

サンプルサイズの影響

下記のデータを利用して無相関検定に対するサンプルサイズの影響を検証する。

study_time <- c(1, 3, 10, 12, 6, 3, 8, 4, 1, 5)
point <- c(20, 40, 100, 80, 50, 50, 70, 50, 10, 60)

この時

p-value = 0.0002651

次にサンプルサイズを２倍にする。

> study_time_2 <- rep(study_time, 2)
> study_time_2
 [1]  1  3 10 12  6  3  8  4  1  5  1  3 10 12  6  3  8  4  1  5
> point_2 <- rep(point, 2)
> point_2
 [1]  20  40 100  80  50  50  70  50  10  60  20  40 100  80  50  50  70  50
[19]  10  60

> cor.test(study_time_2, point_2, method="pearson")

        Pearson's product-moment correlation

data:  study_time_2 and point_2 
t = 9.2703, df = 18, p-value = 2.829e-08
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 0.7810631 0.9639436 
sample estimates:
      cor 
0.9092974

p値は2.829e-08となっていて、サンプルサイズが大きくなると有意になりやすいことがわかる。

Rに関するmemo

R - ElemStatLearn - Coutries

データフレームで層別に操作をする