Page 82 - 4660
P. 82
Experiments, samples and populations
Writing out the last expression in full, we obtain the form most useful for calculations, which reads
( ) ( ) ( )
N N N
1 ∑ 1 ∑ ∑
V xy = x i y i − x i y i .
N N 2
i=1 i=1 i=1
We may also define the closely related sample correlation by
V xy
r xy = ,
s x s y
which can take values between −1 and +1. If the x i and y i are independent then V xy = 0 = r x y,
and from (1.12) we see that xy = ¯x · ¯y. It should also be noted that the value of r xy is not altered
by shifts in the origin or by changes in the scale of the x i or y i . In other words, if x = ax + b and
′
′
= cy + d, where a, b, c, d are constants, then r x y = r xy .
′ ′
Example 1.6. Ten UK citizens are selected at random and their heights and weights
are found to be as follows (to the nearest cm or kg respectively):
Person A B C D E F G H I J
Height (cm) 194 168 177 180 171 190 151 169 175 182
Weight (kg) 75 53 72 80 75 75 57 67 46 68
Compute the sample correlation between the heights and weights. ,
Solution. In order to find the sample correlation, we begin by calculating the following sums
(where x i are the heights and y i are the weights)
∑ ∑
x i = 1757, y i = 668,
i i
∑ 2 ∑ 2 ∑
x = 310041, y = 45746, x i y i = 118029.
i i
i i i
The sample consists of N = 10 pairs of numbers, so the means of the x i and of the yi are given
by ¯x = 175.7 and ¯y = 66.8. Also, ¯xy = 11802.9. Similarly, the standard deviations of the x i and
y i are computed, using (1.8), as
√
( ) 2
310041 1757
s x = − = 11.6,
10 10
√
( ) 2
45746 668
s y = − = 10.6.
10 10
Thus the sample correlation is given by
xy − ¯x · ¯y 11802.9 − 175.7 · 66.8
r xy = = = 0.54.
s x s y 11.6 · 10.6
Thus there is a moderate positive correlation between the heights and weights of the people
measured.
It is straightforward to generalize the above discussion to data samples of arbitrary dimension,
the only complication being one of notation. We choose to denote the i th data item from an n-
(1) (2) (n)
dimensional sample as (x , x , . . . , x i ), where the bracketed superscript runs from 1 to n and
i
i
labelstheelementswithinagivendataitemwhereasthesubscriptirunsfrom1toN andlabelsthe
82