3.4 PER MILLION
Cumulative sum control chart concepts
help catch models on the edge
by Joseph D. Conklin
My friend Ken and I go back more than 30 years. We met at Acme Reengineering (names and data in this column have been changed to maintain the real organization’s confidentiality). Acme was the main parts supplier to the local automotive engine rebuilding industry.
It was my first real job out of college. I had the quality degree and title, but no experience. Ken had the experience in quality, although his official title and department were called something else. After some years, both of us were voluntarily drafted into Acme’s lean Six Sigma program.
Ken was put in charge of the daily project details. I was his unofficial answer man for statistical questions. One day, I caught Ken looking downcast. I asked what the problem was. "Model breakdown," was all he could get out.
"What customer?" I asked, figuring some parts for a rebuild order were the headache du jour. He showed me the paper in his hand, and I saw at once he meant something else.
"Old Faithful," one of the statistical models we installed as part of the lean Six Sigma program, no longer worked well (see Figure 1). It was Acme’s model for forecasting sales. Acme used it to know when to order materials or schedule production.
The model had worked extremely well for more than two years. That was how it got the nickname "Old Faithful." It had gone more and more off the rails for the previous six months, seriously underestimating orders.
On one hand, this was good for Acme because it meant more business was coming in. On the other hand, it meant a lot of delays in filling orders because there wasn’t enough material. Ken’s first priority was to fix the forecasts.
Before we started on that, Ken asked me a broader question. He pointed out that the problem crept up on Acme gradually, becoming obvious only in the past few months. He wanted to know whether there was any way to detect this kind of problem sooner. The lean Six Sigma program involved several more models, and the idea of some regular, simple check up for trouble was sounding very attractive at that moment.
Problems with models
"This was a simple question with a complicated answer," I replied. The answer depends on the number of models, the type of breakdown and the other commitments of the staff who are able to maintain and troubleshoot the models.
All statistical models should be periodically reviewed. For a few models, an individual or small team can establish a schedule to ensure each model is regularly checked.
Not all breakdowns come with warnings. Sometimes, the world accelerates so quickly and with so much volatility, it exceeds our ability to model it. Figure 2 shows one possible example. In the pure, extreme version of this situation, there is no model to break down. The priority becomes deciding when the new environment has settled enough to make modeling practical.
Figure 3 shows the case of a sudden shock, followed by a quick return to normal. This situation will capture the organization’s attention and may eventually be accounted for. But there is little or no warning. The priority becomes deciding the likelihood of recurrence and selecting the best response for the next time.
A gradual breakdown, such as the one in Figure 1, offers some warning. If there are more models than staff or time available, some form of automated check is desirable to focus attention on the models most in danger of gradually breaking down.
For this situation, cumulative sum control charts offer one option that is easy to program in an electronic spreadsheet or statistical computing package.1
Cumulative sum control chart concept
Cumulative sum control charts are designed to detect significant shifts from a process target. Imagine process measurements X1, X2 to Xn measured as times t1, t2 to tn. Assume T is the target value for the process. An intuitive measure of shift is to subtract T from each process measure and sum the deviations for all times t: St (Xi - T).
For an on-target process, St (Xi - T) moves randomly around zero. If the process shifts up, St (Xi - T) turns and remains positive. In the case of a shift down, St (Xi - T) turns and remains negative. St (Xi - T) is the basis for a cumulative sum control chart and gives the chart its name.2
To adapt the calculations behind cumulative sum control charts, start with three quantities: the actual values observed for time t1 to tn; and the Ai, and the values from the statistical model you would like to check—the Mi. From these two, you obtain the third quantity, the residuals, ri. For time period ti, ri = Ai - Mi.3
For a model that fits well, tradition prescribes four expectations for the residuals.4 We combine one of these, random variation around zero, with the cumulative sum control chart concept, to produce a helpful indicator of model breakdown. In this case, the target T is zero. The indicator of model breakdown is a sustained shift of the residuals from zero.
Model data for cumulative sum concept
The example data in Online Table 1 builds to our next step.
The model forecasts sales in terms of two predictor variables: the actual sales at the prior time period and whether the forecast falls during the peak season. The predictor for the peak season is coded as a dummy variable, with a value of zero for not-in-peak season and a value of one otherwise.5
The model equation is forecast sales = 25.0932 + (0.6413*prior sales) + (173.6525*peak season).6 The example data in Online Table 1 start after the period of data used to fit the model.
After obtaining the residuals, you need answers to four questions:
- What is the standard error of prediction (se)?
- What is the acceptable risk of falsely declaring a breakdown(a risk)?
- What is the acceptable risk of not detecting a breakdown (b risk)?
- How large of a sustained deviation from zero (called k here) counts as a breakdown?
What is the standard error of prediction?
The standard error of prediction, symbolized as se, is a type of standard deviation and a measure of the inherent uncertainty in the model.7 It is estimated from the model residuals. The estimate of se is symbolized as se. Typically, se is computed from the same data used to fit the model.
In the example here, the model has been fit with prior data, and the data in Online Table 1 begins after that. We compute se in an alternative way, using the data in Online Table 1 and not the data used to fit the model. Our computation allows se to evolve and stabilize as more data become available. We do this for three reasons:
- It handles the case when the data used to fit the model are not available.
- An se not based on the data used to fit the model has advantages.8
- It gives the model a chance to run before declaring a breakdown.9
Our alternative calculation is explained in Online Table 2. It is an example of estimating the standard deviation from the moving range of the data. This technique is part of constructing individuals and moving range control charts in statistical process control.10
What is the acceptable risk of falsely declaring a breakdown (a risk)? What is the acceptable risk of not detecting a breakdown (b risk)?
There are two basic risks in testing for a model breakdown: incorrectly declaring a breakdown and failing to detect one when it occurs.
The first kind of risk is called a risk for short. The second kind is called b risk. Both are expressed as probabilities between 0 and 1.
The calculations for our model breakdown indicator require fixing values for a and b. A reasonable range of values for a is 0 to 0.10. A reasonable range of values for b is 0 to 0.20. For our example, we use a = 0.05 and b = 0.15.11
The reasonable range for a is narrower than for b because in the context of this application, Acme considers a the higher risk. Acme bases this conclusion on:
- Its estimated cost of responding to a false alarm.
- Its belief that the present trouble with meeting orders has raised its awareness such that the next breakdown will not remain undetected for long.
How far of a sustained shift from zero counts as a breakdown? A sustained shift of the residuals, our determinant of model breakdown, is measured in terms of se. A useful rule in many applications is to set k, the level of shift that signals a breakdown, between 0.25se and 1.50se.
A shift less than 0.25se in practice may be too small to represent serious damage to the model. If the interest is in shifts larger than 1.50se, more basic control chart techniques are usually appropriate.12 In our example, we use k = 0.75se.
Putting all the pieces together
As explained in more detail in Online Table 3, we put all the pieces together to construct the model breakdown detector as follows for time period t and residual rt:
-[(k* se) /2] - rt and update
rt - [(k* se) /2] and update
- Update ht and compare to S(t)low and S(t)high.
- Conclude the model has broken down in a negative direction if S(t)low > ht.
- Conclude the model has broken down in a positive direction if S(t)high > ht.
In our example, there is evidence the model has broken down in a positive direction at time period 40.
Ken used the model breakdown technique to determine when the sales forecast deterioration set in. If a model has broken down, the response depends on which of the three basic contexts described in the opening section applies.
In the first context, the appropriate response is to understand better how the world is changing before improving the model. In the second context, the first priority should be determining whether the shock represents a real event and not an error in the data-collection process.
In the third context, gradual breakdown, the problem-solving strategies for diagnosing out-of-control points on a control chart are a reasonable next step.
If more than one model is of concern, the preceding technique can be implemented in a computer program to check each one on a predetermined schedule and to flag any for investigation if a breakdown is threatening. This helps ensure model performance for our models.
The column is based on a true story involving a real organization. Names and data have been changed to maintain the organization’s confidentiality.
References and Notes
- The calculations in this article are carried out in Microsoft Excel.
- For more information on cumulative sum control charts, see Douglas C. Montgomery, Introduction to Statistical Quality Control, seventh edition, John Wiley and Sons, 2013. Because the emphasis in this column is on calculations that can be programmed, the V mask (graphical version) of the charts is not discussed.
- Residuals play a critical role in assessing the quality of statistical models. For more information, see Norman R. Draper and Harry Smith’s Applied Regression Analysis, third edition, John Wiley, 1998. The residuals also can be defined as easily as ri = Mi - Ai. The important thing is to choose a definition and stay consistent throughout the discussion.
- These are (1) random variation around zero, (2) independence, (3) constant variance and (4) close approximation to a normal distribution. Independence equates to the present residual conveying no information about how the next one will behave. Constant variance means the basic uncertainty of the model is the same over the ranges of the predictor variables. Normal distribution is not a strict requirement for a well-performing model but permits convenient and precise expressions of the basic uncertainty. For more information, see Draper and Smith’s Applied Regression Analysis, noted in reference 3.
- Dummy variables are a common way to incorporate qualitative variables into models. For more information, see Draper and Smith’s Applied Regression Analysis, noted in reference 3.
- The calculations for the model coefficients were carried out with more precision than that displayed in Online Table 1 and elsewhere.
- For a more detailed discussion of the standard error of prediction, including how to calculate it, see Draper and Smith’s Applied Regression Analysis, noted in reference 3.
- This is a concept borrowed from cross validation. For a more detailed discussion of cross validation, see section 4.2 of Raymond H. Myers’ Classical and Modern Regression With Applications, second edition, Duxbury, 1990.
- If we test for breakdown starting immediately after the period originally used to fit the model, this principle states to give the model the benefit of the doubt when it begins.
- For more information on individuals and moving range control charts, see section 6.4 in Montgomery’s Introduction to Statistical Quality Control, noted in reference 2.
- The levels of a, b and k should be set based on the importance of the model being checked. Consulting an experienced statistical practitioner is highly recommended. The check for model breakdown can be run multiple times with varying combinations of a, b and k to see whether any robust insights emerge across iterations.
- See especially chapter 6 of Montgomery’s Introduction to Statistical Quality Control, noted in reference 2.
Joseph D. Conklin is a mathematical statistician in Washington, D.C. He earned a master’s degree in statistics from Virginia Tech in Blacksburg and is a senior member of ASQ. Conklin is also an ASQ-certified quality manager, engineer, auditor, reliability engineer and Six Sigma Black Belt.