Page 112 - 4660

P. 112

Simple linear regression and correlation

Example 4.1. We will fit a simple linear regression model to the oxygen purity
data in Table 2.4. The following quantities may be computed:

20 20
∑ ∑
n = 20, x i = 23.92, y i = 1843.21,
i=1 i=1
¯ x = 1.1960, ¯y = 92.1605,
20 20
∑ 2 ∑ 2
y = 170044.5321, x = 29.2892,
i
i
i=1 i=1
20
∑
x i y i = 2214.6566,
i=1
( ) 2
20 20
∑ 1 ∑ 1
2
2
S xx = x − x i = 29.2892 − (23.92) = 0.6808 ,
i
i=1 20 i=1 20
and
( ) ( )
20 20 20
∑ 1 ∑ ∑ 1
S xy = − x u · y i = 2214.6566 − · 23.92 · 1843.21 = 10.17744.
20 20
i=1 i=1 i=1
ˆ
Therefore, the least squares estimates of the slope and intercept are β 1 = S xy =
S xx
10.17744 = 14.94748 and
0.68088
ˆ
ˆ
β 0 = ¯y − β 1 ¯x = 92.1605 − 14.94748 · 1.196 = 74.28331.
The fitted simple linear regression model (with the coefficients The fitted simple
linear regression model (with the coefficients reported to three decimal places)
is
ˆ y = 74.283 + 14.947x.

Estimating population variance. There is actually another unknown parameter in our
2
regression model, σ (the variance of the error term ε). The residuals e i = y i − ˆy i are used to
obtain an estimate of σ . The sum of squares of the residuals, often called the error sum of
2
squares, is
n n
∑ ∑
2
2
SS E = e = (y i − ˆy i ) . (4.11)
i
i=1 i=1
2
Wecanshowthat theexpectedvalueof theerrorsumof squaresis E(SSE) = (n−2)σ . Therefore
2
an unbiased estimator of σ is
2
ˆ σ = SS E . (4.12)
n − 2
Computing SSE using Equation (4.11) would be fairly tedious. A more convenient computing
ˆ
ˆ
formula can be obtained by substituting ˆy i = β 0 + β 1 x i into Equation (4.11) and simplifying. The
resulting computing formula is
ˆ (4.13)
SS E = SS T − β 1 S xy
∑ n 2 ∑ n 2 2
where SS T = (y i − ¯y) = y − n¯y is the total sum of squares of the response variable
i=1 i=1 i
y.

112

107 108 109 110 111 112 113 114 115 116 117