Page 89 - 4660
P. 89
Some basic estimators
If the true mean of the population is unknown, however, a natural alternative is to replace µ by ¯x
2
in (2.6), so that our estimator is simply the sample variance s given by
( ) 2
N N
1 ∑ 1 ∑
2
2
s = x − x i .
N i N
i=1 i=1
2
2
In order to determine the properties of this estimator, we must compute E(s ) and Var(s ). This
task is straightforward but lengthy. However, for the investigation of the properties of a central
moment of the sample, there exists a useful trick that simplifies the calculation. We can assume,
with no loss of generality, that the mean µ 1 of the population from which the sample is drawn
is equal to zero. With this assumption, the population central moments, ν r , are identical to the
corresponding moments µ r , and we may perform our calculation in terms of the latter. At the end,
however, we replace µ r by ν r in the final result and so obtain a general expression that is valid even
2
2
2
in cases where µ 1 = 0. It can be proved that E(s ) = N−1 σ . From this we see that s is a biased
N
2
estimator of σ , although the bias becomes negligible for large N. However, it immediately follows
2
that an unbiased estimator of σ is given simply by
N
2
ˆ 2
σ = s , (2.7)
N − 1
where the multiplicative factor N/(N − 1) is often called Bessel’s correction. Thus in terms of
2
the sample values x i , i = 1, 2, . . . , N, an unbiased estimator of the population variance σ is given
by
N
1 ∑
2
ˆ 2
σ = (x i − ¯x) . (2.8)
N − 1
i=1
ˆ 2
The variance of the estimator σ is
( ) 2 ( )
N 1 N − 3
2
ˆ 2
Var(σ ) = Var(s ) = ν 4 − ν 2 2 ,
N − 1 N N − 1
2
where ν r is the r-th central moment of the parent population. We note that, since E(σ ) = σ and
ˆ 2
ˆ 2
Var(σ ) → 0 as N → ∞, the statistic σ is also a consistent estimator of the population variance.
ˆ 2
The standard deviation σ of a population is defined as the positive square root of the
2
population variance σ (as, indeed, our notation suggests). Thus, it is common practice to take
the positive square root of the variance estimator as our estimator for σ. Thus, we take
1/2
( )
ˆ σ = σ b 2 , (2.9)
b 2
where σ is given by either (2.6) or (2.7), depending on whether the population mean µ is known
or unknown.
Using these methods we can consider estimators for the population covariance Cov[x, y] and
for the correlation Corr[x, y].
N
Cov[x, y] = V xy ,
d
N − 1
where V xy = xy − ¯x · ¯y,
d
[
Corr[x, y] = Cov[x, y] . (2.10)
ˆ σ x ˆσ y
In the case in which the means µ x and µ y are unknown, a suitable (but biased) estimator is
[
Corr[x, y] = N V xy = N r xy , (2.11)
N − 1 s x s y N − 1
89