Page 85 - 4660
P. 85
the rectangle height should be
bin frequency
Rectangle height = .
bin width
Inpassingfromeitherthe original datatoa frequency distribution orhistogram, wehavelost some
information because we no longer have the individual observations. However, this information
loss is often small compared with the conciseness and ease of interpretation gained in using the
frequency distribution and histogram.
2.2. Estimation
Estimators and sampling distributions
In general, the population P(x) from which a sample x 1 , x 2 , . . . , x N is drawn is unknown. The
central aim of statistics is to use the sample values x i to infer certain properties of the unknown
population P(x), such as its mean, variance and higher moments. To keep our discussion in
general terms, let us denote the various properties of the population by a 1 , a 2 , . . . , or collectively
by a. Moreover, we make the dependence of the population on the values of these quantities
explicit by writing the population as P(x|a). For the moment, we are assuming that the sample
values x i are independent and drawn from the same (one-dimensional) population P(x|a), in
which case
P(x|a) = P(x 1 |a)P(x 2 |a) · · · P(x N |a).
Suppose, we wish to estimate the value of one of the quantities a 1 , a 2 , . . . , which we will denote
simply by a. Since the sample values x i are our only source of information, any estimate of a must
be some function of the x i , i.e. some sample statistic. Such a statistic is called an estimator of a
and is usually denoted by ˆa(x), where x denotes the sample elements x 1 , x 2 , . . . , x N .
Since an estimator ˆa is a function of the sample values of the random variables x 1 , x 2 , . . . , x N ,
it too must be a random variable. In other words, if a number of random samples, each of the same
size N, are taken from the (one-dimensional) population P(x|a) then the value of the estimator
ˆ a will vary from one sample to the next and in general will not be equal to the true value a. This
variation of the estimator is described by its sampling distribution P(ˆa|a). This is given by
N
P(ˆa|a)dˆa = P(x|a)d x,
N
where d x is the infinitesimal ’volume’ in x-space lying between the ’surfaces’ ˆa(x) = ˆa and
ˆ a(x) = ˆa + dˆa. The form of the sampling distribution generally depends upon the estimator
under consideration and upon the form of the population from which the sample was drawn,
including, as indicated, the true values of the quantities a. It is also usually dependent on the
sample size N.
Consistency, bias and efficiency of estimators
For any particular quantity a, we may in fact define any number of different estimators, each of
which will have its own sampling distribution. The quality of a given estimator ˆa may be assessed
by investigating certain properties of its sampling distribution P(ˆa|a). In particular, an estimator
ˆ a is usually judged on the three criteria of consistency, bias and efficiency, each of which we now
discuss.
85