Page 83 - 4660
P. 83

Frequency distributions and histograms


               Table 2.2 – Compressive Strength (in psi) of 80 Aluminium-Lithium Alloy Specimens (data with D.
               C. Montgomery, G. C. Runger Applied Statistics and Probability for Engineers)



                 105 221 183 186 121 181 180              143
                 97    154 153 174 120 168          167 141
                 245 228 174 199 181 158 176              110
                 163 131 154 115 160 208 158              133
                 207 180 190 193 194 133 156              123
                 134 178     76   167 184 135 229         146
                 218 157 101 171 165 172 158              169
                 199 151 142 163 145 171 148              158
                 160 175 149       87   160 237 150       135
                 196 201 200 176 150 170 118              149


               data items within the sample. In this n-dimensional case, we can define the sample covariance
               matrix whose elements are
                                                                          (l)
                                                          (k) (l)
                                                   V kl = x x    − x (k)  · x ,
               and the sample correlation matrix with elements

                                                                 V kl
                                                          r kl =    .
                                                                s k s l
               Both these matrices are clearly symmetric but are not necessarily positive definite.



                     Frequency distributions and histograms


               A frequency distribution is a more compact summary of data than a stem-and-leaf diagram. To
               construct a frequency distribution, we must divide the range of the data into intervals, which are
               usually called class intervals, cells, or bins. If possible, the bins should be of equal width in order
               to enhance the visual information in the frequency distribution. Some judgement must be used
               in selecting the number of bins so that a reasonable display can be developed. The number of
               bins depends on the number of observations and the amount of scatter or variance in the data. A
               frequencydistributionthatuseseithertoofewortoomanybinswillnotbeinformative. Weusually
               find that between 5 and 20 bins is satisfactory in most cases and that the number of bins should
               increase with n. There are several sets of rules that can be used to determine the member of bins
               in a histogram. However, choosing the number of bins approximately equal to the square root of
               the number of observations often works well in practice.
                   For example, consider the data in Table 2.2. These data are the compressive strengths in
               kilograms per square centimeter (psi) of 80 specimens of a new aluminium-lithium alloy
               undergoing evaluation as a possible material for aircraft structural elements. The data were
               recorded in the order of testing, and in this format they do not convey much information about
               compressive strength.
                   A frequency distribution for the comprehensive strength data in Table 2.2 is shown in Table 2.3.
                                                                      √
               Since the data set contains 80 observations, and since   80 ≈ 9, we suspect that about eight to
               nine bins will provide a satisfactory frequency distribution. The largest and smallest data values
               are 245 and 76, respectively, so the bins must cover a range of at least 245 − 76 = 169 units on
               the psi scale. If we want the lower limit for the first bin to begin slightly below the smallest data
               value and the upper limit for the last bin to be slightly above the largest data value, we might start
               the frequency distribution at 70 and end it at 250. This is an interval or range of 180 psi units. Nine


                                                              83
   78   79   80   81   82   83   84   85   86   87   88