Page 85 - 4660
P. 85

the rectangle height should be

                                                                  bin frequency
                                              Rectangle height =                .
                                                                     bin width

               Inpassingfromeitherthe original datatoa frequency distribution orhistogram, wehavelost some
               information because we no longer have the individual observations. However, this information
               loss is often small compared with the conciseness and ease of interpretation gained in using the
               frequency distribution and histogram.


                                                 2.2. Estimation




                     Estimators and sampling distributions


               In general, the population P(x) from which a sample x 1 , x 2 , . . . , x N is drawn is unknown. The
               central aim of statistics is to use the sample values x i to infer certain properties of the unknown
               population P(x), such as its mean, variance and higher moments. To keep our discussion in
               general terms, let us denote the various properties of the population by a 1 , a 2 , . . . , or collectively
               by a. Moreover, we make the dependence of the population on the values of these quantities
               explicit by writing the population as P(x|a). For the moment, we are assuming that the sample
               values x i are independent and drawn from the same (one-dimensional) population P(x|a), in
               which case
                                             P(x|a) = P(x 1 |a)P(x 2 |a) · · · P(x N |a).
               Suppose, we wish to estimate the value of one of the quantities a 1 , a 2 , . . . , which we will denote
               simply by a. Since the sample values x i are our only source of information, any estimate of a must
               be some function of the x i , i.e. some sample statistic. Such a statistic is called an estimator of a
               and is usually denoted by ˆa(x), where x denotes the sample elements x 1 , x 2 , . . . , x N .
                   Since an estimator ˆa is a function of the sample values of the random variables x 1 , x 2 , . . . , x N ,
               it too must be a random variable. In other words, if a number of random samples, each of the same
               size N, are taken from the (one-dimensional) population P(x|a) then the value of the estimator
               ˆ a will vary from one sample to the next and in general will not be equal to the true value a. This
               variation of the estimator is described by its sampling distribution P(ˆa|a). This is given by

                                                                        N
                                                    P(ˆa|a)dˆa = P(x|a)d x,
                        N
               where d x is the infinitesimal ’volume’ in x-space lying between the ’surfaces’ ˆa(x) = ˆa and
               ˆ a(x) = ˆa + dˆa. The form of the sampling distribution generally depends upon the estimator
               under consideration and upon the form of the population from which the sample was drawn,
               including, as indicated, the true values of the quantities a. It is also usually dependent on the
               sample size N.



                     Consistency, bias and efficiency of estimators


               For any particular quantity a, we may in fact define any number of different estimators, each of
               which will have its own sampling distribution. The quality of a given estimator ˆa may be assessed
               by investigating certain properties of its sampling distribution P(ˆa|a). In particular, an estimator
               ˆ a is usually judged on the three criteria of consistency, bias and efficiency, each of which we now
               discuss.


                                                              85
   80   81   82   83   84   85   86   87   88   89   90