Page 83 - 4660
P. 83
Frequency distributions and histograms
Table 2.2 – Compressive Strength (in psi) of 80 Aluminium-Lithium Alloy Specimens (data with D.
C. Montgomery, G. C. Runger Applied Statistics and Probability for Engineers)
105 221 183 186 121 181 180 143
97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149
data items within the sample. In this n-dimensional case, we can define the sample covariance
matrix whose elements are
(l)
(k) (l)
V kl = x x − x (k) · x ,
and the sample correlation matrix with elements
V kl
r kl = .
s k s l
Both these matrices are clearly symmetric but are not necessarily positive definite.
Frequency distributions and histograms
A frequency distribution is a more compact summary of data than a stem-and-leaf diagram. To
construct a frequency distribution, we must divide the range of the data into intervals, which are
usually called class intervals, cells, or bins. If possible, the bins should be of equal width in order
to enhance the visual information in the frequency distribution. Some judgement must be used
in selecting the number of bins so that a reasonable display can be developed. The number of
bins depends on the number of observations and the amount of scatter or variance in the data. A
frequencydistributionthatuseseithertoofewortoomanybinswillnotbeinformative. Weusually
find that between 5 and 20 bins is satisfactory in most cases and that the number of bins should
increase with n. There are several sets of rules that can be used to determine the member of bins
in a histogram. However, choosing the number of bins approximately equal to the square root of
the number of observations often works well in practice.
For example, consider the data in Table 2.2. These data are the compressive strengths in
kilograms per square centimeter (psi) of 80 specimens of a new aluminium-lithium alloy
undergoing evaluation as a possible material for aircraft structural elements. The data were
recorded in the order of testing, and in this format they do not convey much information about
compressive strength.
A frequency distribution for the comprehensive strength data in Table 2.2 is shown in Table 2.3.
√
Since the data set contains 80 observations, and since 80 ≈ 9, we suspect that about eight to
nine bins will provide a satisfactory frequency distribution. The largest and smallest data values
are 245 and 76, respectively, so the bins must cover a range of at least 245 − 76 = 169 units on
the psi scale. If we want the lower limit for the first bin to begin slightly below the smallest data
value and the upper limit for the last bin to be slightly above the largest data value, we might start
the frequency distribution at 70 and end it at 250. This is an interval or range of 180 psi units. Nine
83