Bart Hamilton is an instructor in the University of Akron's engineering and science technology department. He retired from Uniroyal Goodrich Tire Co. after more than 35 years in tire research and development. Hamilton earned a master's degree in industrial management from the University of Akron and is ASQ certified as a quality engineer and quality manager.
ONE GOOD IDEA
High Probability of Success
Use Excel to determine if data are normally distributed
By Bart Hamilton
Shape, Location and spread are three attributes of a distribution that need to be determined when working with any data. The first attribute—shape or type of distribution—is needed to know which statistics are appropriate to use. Some of the distributions used for quality purposes include normal, binomial, Poisson, Weibull, exponential and hypergeometric.
Normal is the most frequently occurring type of distribution, but to use the statistics relating to normal distribution, it is necessary to prove the data are, in fact, normally distributed. There are several ways of confirming that fact, including:
- Comparing the data to the empirical distribution of the data using standard deviations and data count. Plus or minus one standard deviation is approximately 68% of the data. Plus or minus two standard deviations is approximately 95.5% of the data. Plus or minus three standard deviations is approximately 99.7% of the data.
- Showing the location of the median and mean to be within one-half standard deviation.
- Analyzing the distribution of the data and comparing it to chi-square distribution.
- Creating a normal probability plot of the data and comparing the result to a straight line. If a straight line is produced, the data are normally distributed.
In the case of the final method, the probability plot is a scatter plot of a calculated plotting position for each data item based on its rank position in the data set and the associated data value. Plotting is done on specialized graph paper, a sample of which is available at www.weibull.com.
One of the axes, usually the horizontal one, is scaled to the normal distribution. Plotting positions are assigned a place on this axis. The vertical axis is linear and is used to plot data values. The scatter plot results from the intersection of the plotting positions and data values.
The difficulty in plotting arises when a computer is used to create the scatter plot. Many graphing programs do not have an option for creating a normal probability plot. This can be overcome by using a transformed value for the plotting position. The transformation is performed by converting the calculated plotting position to a z-value, which is obtained using the Excel function NORMSINV. Then, a linear scatter plot can be constructed by plotting the data values versus the z-value.
The procedure for creating a normal probability plot is:
- Sort the data in ascending order.
- Assign ranks to the data. The minimum data value is assigned a rank of 1.
- Calculate a plotting position using the formula PP = (I - 0.5)/n in which I is the rank and n is the number of data values.
- Convert the plotting position to z-values using the Excel function NORMSINV.
- Construct a scatter plot using the z-values as the horizontal axis and the sorted data value as the vertical axis.
- Check the scatter plot to see if a straight line can represent it. If it can, then the data are normal.
An example of those efforts can be seen in Figure 1.
The plot is made using the sorted data column and the standard deviation in a scatter plot. The conversion of the plot position to standard deviations permits the use of a linear axis for each variable. Thus we see how, with a modicum of effort, Excel can be used to produce a probability plot for determining the normality of data.
Dale H. Besterfield, Quality Control, seventh edition, Pearson Prentice Hall, 2003, pp. 161-163.
Nancy R. Tague, The Quality Toolbox, second edition, Quality Press, 2005, pp. 365-370.