BACK TO BASICS
Uncovering the Truth
Understanding sources of variation in statistical analysis
by Arved Harding
Have you analyzed a data set with the goal of comparing the standard deviations of two groups? This common statistical objective is usually tackled with tests such as a routine F-test, a more sophisticated Levene-Brown-Forsythe test or Bonnett’s test. Some people may even stop to check normality and other assumptions before proceeding with their hypothesis tests. But could there be more to the data?
For example, I worked on a project that had a deeper story than just the p-value from a simple F-test. A parts manufacturer was developing a new process to make its product more consistent and hoped this new method would decrease the variation seen in the key product-performance metric. Eleven samples were made using the new experimental process, and they were compared to available historical data via a traditional F-test.
The F-test p-value of 0.044 indicates a statistically significant difference in the standard deviations with better than 95% confidence (see Online Table 1). You, therefore, could conclude that there is sufficient evidence to support the hypothesis that the new method decreases process variation.
So, what’s wrong with this conclusion? On the surface, it looks correct. The p-value is less than 0.05. Figure 1 shows a nice histogram overlaying the data, which provides a positive testimony for the new method. But a closer look at the data and potential sources of variation reveals that the 32 data points coming from the historical data set are hiding some important issues.
Five different sample collection dates were used for the historical data (see Online Figure 1), but only one date was used during the experimental data. Collecting data for several different time periods is a good idea because it allows you to check for day-to-day variation or even factors such as variation between work shifts.
The variance component analysis of the historical data showed that 64.4% of the variation was due to the sample-collection date (see Online Table 2). Because the experimental process was only run on one day, there is no estimate of the day-to-day variation for the experimental process. It is possible that if the experimental process ran on a few different days, you might see more overall variation. While drawing conclusions from our initial comparison of standard deviations may at first appear to be statistically correct, it could very well not be the correct conclusions from the data and problem at hand.
What if we took a different analysis approach and only compared the "within sample collection date variation" for the two methods? From just the variance component analysis of the historical data, Online Table 2 shows the data pooled within collection date standard deviation to be 6.72. With varying sample sizes on the sample collection dates, the degrees of freedom for the historical data would be the sum of the n-1’s for each day or 27.
The 11 samples of the experimental data were only collected on one day, so we used all of the data to estimate what was within data-collection date standard deviation and 10 degrees of freedom. The F-test for the pooled summary data (see Online Table 3) indicated no significant difference between the two methods (p = 0.537). Using this pooled approach, there was not strong evidence pointing to the new method being superior. The team was advised to collect additional sets of data from the experimental method and perform a similar analysis later.
To prevent this issue from recurring, always ask questions about data-collection methods—especially if you’re not directly involved in that step. It is important to understand potential sources of variation prior to doing statistical analysis. Remember the top three rules of data analysis: Plot the data, plot the data and plot the data.
Arved Harding is a senior statistical associate with Eastman Chemical Co. in Kingsport, TN, and an adjunct instructor at Northeast State Community College in Blountville, TN. He earned a master’s degree in statistics from Virginia Tech in Blacksburg. Harding is an ASQ senior member and an ASQ-certified quality engineer.