DATA ANALYSIS TOOL
Getting the Luster Back
The underused cumulative sum sequence plot can shine the way
by James M. Lucas and Ronald D. Snee
A textile fiber manufacturing plant was having a production yield problem. On occasion, the plant was producing yarn that was not bright enough. The low-luster yarn was recycled, thus reducing yields and increasing manufacturing costs. When low-luster yarn was produced from a vessel on a 30-vessel machine, that vessel was removed, its screens were cleaned and the vessel was replaced. The plant had been experiencing a higher replacement rate for vessels than it had observed in the past.
A plant scientist working on the low-luster problem thought the problem was from the flake that was melted to produce the yarn. There were many upstream plants, including paper plants that could have been adding ingredients to the river water that could affect the purity of the flake. Plant chemists had analyzed the river water, but they couldn’t explain why the river water was causing low-luster problems.
Data and initial analysis
For more than a year (390 days), the plant scientist collected daily data, which consisted of the vessels removed and the amount of water treatment needed to reduce the turbidity of the river water to an acceptable level. The data were lagged by a week because there was a week delay between flake production and its use in the spinning machine.
The first step in the analysis was to create a sequence (time) plot of the vessels removed per day versus day (Figure 1) to look for nonrandom patterns that might suggest potential sources of the problem. No apparent trends surfaced in the plot. The smooth (average) trend showed a slight decrease in the vessels removed beginning at about day 216. This was encouraging. But what was the cause of this downward trend?
Next, a scatter plot of the vessels removed versus the water treatment levels was created (Figure 2). The plant scientist could not see any significant relationship in this scatter plot. This trend in this plot was in the expected direction, but it was not large enough to be statistically significant.
At this point, it was clear that no major relationships were apparent in the data, and there was considerable variation in the data. Plant management decided to ask one of the authors, Jim Lucas, for help by analyzing the data and identifying what additional data might be needed to solve the problem.
The CUSUM sequence plot
After reviewing the results of the sequence plot (Figure 1) and the scatter plot (Figure 2), it was clear that a different approach was needed. Lucas had seen that the cumulative sum (CUSUM) sequence plot was effective in dealing with small trends and noisy data. The CUSUM sequence plot is associated with the CUSUM process control technique.1,2
A CUSUM monitoring procedure is typically is used to decide when a process is in a state of statistical control. Here, a CUSUM sequence plot would be used as a data analysis diagnostic tool to determine when shifts and trends have occurred in a process and to study relationships among different factors. This is a different use of a CUSUM, which has not been widely recognized as valuable.
Before applying a CUSUM to the low-luster problem, it is helpful to review how the CUSUM sequence plot is constructed and interpreted.
The construction of the CUSUM sequence plot is shown using the data in Online Table 1, and creating the plots in Figure 3 and Online Figure 1. The CUSUM sequence plot shows the cumulative sum of the deviations from a target or reference value versus the sequence number, as shown in Online Table 1.3
In Online Table 1 and Figures 1 and 2, there is a downward shift at about observation six. The downward shift is even more apparent when examining the CUSUM sequence plot than when studying the raw data (Online Table 1 and Figure 3). The ability to quickly detect small differences and determine the points in time when the shifts have occurred is a major strength of CUSUM sequence plots.
Interpreting sequence plots
The slopes in the CUSUM sequence plot show the data means relative to the target value used. Some example slopes are shown in Online Figure 2.
The analysis of the low-luster data started by constructing a CUSUM sequence plot separately for the number of vessels removed and the level of water treatment (Online Figures 3 and 4). The target values used in the plots were the overall average for each of the two series (0.5 for vessels removed and 179 for water treatment). The mean usually is a good reference (target) value for the initial CUSUM sequence plot.
With this choice of reference value, the CUSUM will start and end at zero. The CUSUM sequence plots show the major change point in both variables occurred together (at about day 216).
Online Figure 5 shows eight different time segments identified by eyeballing the different slopes of the vessels-removed CUSUM sequence plot. There is less visual correspondence in the two CUSUM sequence plots (Online Figures 3 and 4 for an eight-segment model than there is for a two-segment model with one major shift because some short-term level changes in the vessels-removed CUSUM sequence plot do not correspond to changes in the water treatment CUSUM sequence plot.
To summarize, water treatment was used to reduce the turbidly of the river water. The amount of water treatment added depends on the amount of the turbidity present. As the turbidity goes up, more chemical is added. The number of vessels removed increases as well.
A plot of the vessels removed versus water treatment level shows a weak positive correlation (Figure 2). The CUSUM sequence plot for the vessels removed and the water treatment shows a close correspondence for the major level change for both variables.
It is now clear that the water treatment does not remove all the impurities associated with low luster, resulting in vessel removal. A different water source with less impurities is needed.
A solution? Use well water. The analysis convinced the plant to drill a well and use well water for the flake production. Using well water solved the low-luster problem. When well water was used for flake production, the vessels removed remained at a low level.
A few years later, the plant was thinking of saving costs by closing the well and using river water for flake production. The statistician working on the project recalled that Lucas had done the work to justify drilling the well. After conferring with him, the plant dropped that cost savings idea. It is amazing how short organizational memory can be.
A project value in terms of dollars saved was not computed, but it’s clear the impact of the project was major. Replacing a vessel means removing the old vessel and replacing it with a cleaned refurbished vessel.
If 0.5 vessels per day are being replaced, a certain level of cleaning and refurbishing staff is needed. By increasing the removal rate, you overwork current staffing levels or need additional staff and, perhaps, more spare vessels, which increases staff and management costs. In addition, the low-luster yarn was recycled, thus creating a low yield problem and additional manufacturing costs.
Fitting a linear model
The CUSUM sequence plot shows a major shift around day 216. Using an indicator variable (that we named “two groups”) for this shift gives a vector that is zero for days one to 216, and one for days 217 to 290. This vector is a significant term in a linear model that is summarized in Online Table 2.
The three-term model contains the “two group” indicator variable, the water treatment term and the interaction that allows for a different water treatment slope in the two groups. The two groups term, found from the CUSUM sequence plot, is significant at the 0.001 level while the other two model terms are not significant. CUSUM sequence plots can help you in your model building efforts.
A useful tool
A CUSUM sequence plot is a useful and underused data analysis tool. It is helpful for identifying small shifts in level in noisy data. It is easy to compute and interpret. The graphics are a great aid in interpreting the results.
A CUSUM sequence plot also helps identify when process shifts occur. In Online Figures 3-5, you see clearly where eight different shifts occurred during the 390-day period. You also see whether the shifts are up or down, and the relative sizes of the shifts.
The low-luster case also shows that CUSUM can be helpful when dealing with low count data. In this case, the number of vessels removed varied from zero to four per day (mean = 0.5, median = 0). The counts for days with zero, one, two, three and four vessels removed were 243, 110, 27, nine and one, respectively. Hence, 91% of the days (353/390) had zero or one vessel removed. It is difficult to see trends and differences with such small counts.
The CUSUM sequence plot also can help you develop models. The CUSUM sequence plot shows a major shift at about day 216. If you divide the data into two groups (days one to 216, and days 217 to 390) and compare the means of the two groups with a t-test, you find a statistically significant effect of -0.28 vessels removed (p = 0).
If you include the terms for water treatment and the interaction between group and water treatment in the model, you still find a significant difference between the groups (p = 0.001). The water treatment and water treatment by group interaction effects were insignificant. Thus, seeing where in the sequence a shift has occurred suggests a model that can be used to identify statistically significant effects.
Consider making use of a CUSUM sequence plot the next time you are tasked with interpreting sequential data.
- James M. Lucas, “The Design and Use of V-Mask Control Schemes,” Journal of Quality Technology, Vol. 8, No. 1, 1976, pp. 1-12.
- Douglas C. Montgomery, Introduction to Statistical Quality Control, John Wiley and Sons, 2012.
- Lucas, “The Design and Use of V-Mask Control Schemes,” see reference 1.
© 2019 James M. Lucas and Ronald D. Snee.
James M. Lucas is the principal at J.M. Lucas and Associates in Wilmington, DE. He has a doctorate in statistics from Texas A&M University in College Station. Lucas is an ASQ fellow and a recipient of ASQ’s Shewhart Medal and Brumbaugh Award.
Ronald D. Snee is president of Snee Associates LLC in Newark, DE. He has a doctorate in applied and mathematical statistics from Rutgers University in New Brunswick, NJ. Snee has received ASQ’s Shewhart, Grant and Distinguished Service Medals. He is an ASQ honorary member and an academician in the International Academy for Quality.