Page 82 - 4660
P. 82

Experiments, samples and populations


               Writing out the last expression in full, we obtain the form most useful for calculations, which reads

                                                 (        )        (       ) (       )
                                                    N                 N         N
                                               1   ∑            1    ∑         ∑
                                       V xy =          x i y i  −        x i       y i  .
                                              N                 N 2
                                                   i=1               i=1        i=1
               We may also define the closely related sample correlation by

                                                                 V xy
                                                          r xy =     ,
                                                                s x s y

               which can take values between −1 and +1. If the x i and y i are independent then V xy = 0 = r x y,
               and from (1.12) we see that xy = ¯x · ¯y. It should also be noted that the value of r xy is not altered
               by shifts in the origin or by changes in the scale of the x i or y i . In other words, if x = ax + b and
                                                                                                 ′
               ′
                 = cy + d, where a, b, c, d are constants, then r x y = r xy .
                                                               ′ ′
               Example 1.6. Ten UK citizens are selected at random and their heights and weights
               are found to be as follows (to the nearest cm or kg respectively):
                       Person        A     B     C     D     E     F     G     H     I     J
                    Height (cm) 194 168 177           180 171     190   151 169    175 182
                    Weight (kg)     75    53    72    80    75    75    57    67    46    68
                   Compute the sample correlation between the heights and weights.                            ,

               Solution. In order to find the sample correlation, we begin by calculating the following sums
               (where x i are the heights and y i are the weights)

                                                 ∑               ∑
                                                     x i = 1757,     y i = 668,
                                                   i              i

                                     ∑    2            ∑    2           ∑
                                         x = 310041,       y = 45746,       x i y i = 118029.
                                          i                 i
                                      i                 i                i
               The sample consists of N = 10 pairs of numbers, so the means of the x i and of the yi are given
               by ¯x = 175.7 and ¯y = 66.8. Also, ¯xy = 11802.9. Similarly, the standard deviations of the x i and
               y i are computed, using (1.8), as

                                                   √
                                                                (      ) 2
                                                      310041      1757
                                              s x =          −            = 11.6,
                                                        10         10
                                                    √
                                                                (     ) 2
                                                       45746      668
                                               s y =         −           = 10.6.
                                                         10       10
               Thus the sample correlation is given by

                                             xy − ¯x · ¯y  11802.9 − 175.7 · 66.8
                                       r xy =           =                        = 0.54.
                                                s x s y         11.6 · 10.6
               Thus there is a moderate positive correlation between the heights and weights of the people
               measured.

               It is straightforward to generalize the above discussion to data samples of arbitrary dimension,
               the only complication being one of notation. We choose to denote the i th data item from an n-
                                         (1)  (2)      (n)
               dimensional sample as (x , x , . . . , x i  ), where the bracketed superscript runs from 1 to n and
                                              i
                                         i
               labelstheelementswithinagivendataitemwhereasthesubscriptirunsfrom1toN andlabelsthe
                                                              82
   77   78   79   80   81   82   83   84   85   86   87