## 2019

3.4 PER MILLION

# Before Breakdown

## Cumulative sum control chart concepts

help catch models on the edge

by Joseph D. Conklin

My friend Ken and I go back more than 30 years. We met at Acme Reengineering (names and data in this column have been changed to maintain the real organization’s confidentiality). Acme was the main parts supplier to the local automotive engine rebuilding industry.

It was my first real job out of college. I had the quality degree and title, but no experience. Ken had the experience in quality, although his official title and department were called something else. After some years, both of us were voluntarily drafted into Acme’s lean Six Sigma program.

Ken was put in charge of the daily project details. I was his unofficial answer man for statistical questions. One day, I caught Ken looking downcast. I asked what the problem was. "Model breakdown," was all he could get out.

"What customer?" I asked, figuring some parts for a rebuild order were the headache du jour. He showed me the paper in his hand, and I saw at once he meant something else.

"Old Faithful," one of the statistical models we installed as part of the lean Six Sigma program, no longer worked well (see Figure 1). It was Acme’s model for forecasting sales. Acme used it to know when to order materials or schedule production.

The model had worked extremely well for more than two years. That was how it got the nickname "Old Faithful." It had gone more and more off the rails for the previous six months, seriously underestimating orders.

On one hand, this was good for Acme because it meant more business was coming in. On the other hand, it meant a lot of delays in filling orders because there wasn’t enough material. Ken’s first priority was to fix the forecasts.

Before we started on that, Ken asked me a broader question. He pointed out that the problem crept up on Acme gradually, becoming obvious only in the past few months. He wanted to know whether there was any way to detect this kind of problem sooner. The lean Six Sigma program involved several more models, and the idea of some regular, simple check up for trouble was sounding very attractive at that moment.

### Problems with models

"This was a simple question with a complicated answer," I replied. The answer depends on the number of models, the type of breakdown and the other commitments of the staff who are able to maintain and troubleshoot the models.

All statistical models should be periodically reviewed. For a few models, an individual or small team can establish a schedule to ensure each model is regularly checked.

Not all breakdowns come with warnings. Sometimes, the world accelerates so quickly and with so much volatility, it exceeds our ability to model it. Figure 2 shows one possible example. In the pure, extreme version of this situation, there is no model to break down. The priority becomes deciding when the new environment has settled enough to make modeling practical.

Figure 3 shows the case of a sudden shock, followed by a quick return to normal. This situation will capture the organization’s attention and may eventually be accounted for. But there is little or no warning. The priority becomes deciding the likelihood of recurrence and selecting the best response for the next time.

A gradual breakdown, such as the one in Figure 1, offers some warning. If there are more models than staff or time available, some form of automated check is desirable to focus attention on the models most in danger of gradually breaking down.

For this situation, cumulative sum control
charts offer one option that is easy to program in an electronic spreadsheet or
statistical computing package.^{1}

### Cumulative sum control chart concept

Cumulative sum control charts are designed to
detect significant shifts from a process target. Imagine process measurements X_{1},
X_{2} to X_{n} measured as times t_{1},
t_{2} to t_{n}. Assume T is the target value for the process.
An intuitive measure of shift is to subtract T from each process measure and
sum the deviations for all times t: S_{t}
(X_{i} - T).

For an on-target process, St (X_{i}
- T) moves randomly around zero. If the process shifts up, S_{t} (X_{i}
- T) turns and remains positive. In the case of a shift down, S_{t} (X_{i}
- T) turns and remains negative. S_{t} (X_{i} - T) is the basis for a cumulative
sum control chart and gives the chart its name.^{2 }

To adapt the calculations behind cumulative
sum control charts, start with three quantities: the actual values observed for
time t_{1} to t_{n}; and the A_{i},
and the values from the statistical model you would like to check—the M_{i}.
From these two, you obtain the third quantity, the residuals, r_{i}. For time period t_{i}, r_{i}
= A_{i} - M_{i}.^{3}

For a model that fits well, tradition
prescribes four expectations for the residuals.^{4} We
combine one of these, random variation around zero, with the cumulative sum
control chart concept, to produce a helpful indicator of model breakdown. In
this case, the target T is zero. The indicator of model breakdown is a
sustained shift of the residuals from zero.

### Model data for cumulative sum concept

The example data in Online Table 1 builds to our next step.

The model forecasts sales in terms of two
predictor variables: the actual sales at the prior time period and whether the
forecast falls during the peak season. The predictor for the peak season is
coded as a dummy variable, with a value of zero for not-in-peak season and a
value of one otherwise.^{5}

The model equation is forecast sales =
25.0932 + (0.6413*prior sales) + (173.6525*peak season).^{6}
The example data in Online Table 1 start after the period of data used to fit
the model.

After obtaining the residuals, you need answers to four questions:

- What
is the standard error of prediction (s
_{e})? - What is the acceptable risk of falsely declaring a breakdown(a risk)?
- What is the acceptable risk of not detecting a breakdown (b risk)?
- How
large of a sustained deviation from zero (called
*k*here) counts as a breakdown?

### What is the standard error of prediction?

The standard error
of prediction, symbolized as s_{e}, is a type of standard deviation and a measure of
the inherent uncertainty in the model.^{7} It
is estimated from the model residuals. The estimate of s_{e} is symbolized as s_{e}.
Typically, s_{e} is computed from the same data used to fit the model.

In the example here, the model has been fit
with prior data, and the data in Online Table 1 begins after that. We compute s_{e}
in an alternative way, using the data in Online Table 1 and not the data used
to fit the model. Our computation allows s_{e} to evolve and stabilize
as more data become available. We do this for three reasons:

- It handles the case when the data used to fit the model are not available.
- An s
_{e}not based on the data used to fit the model has advantages.^{8} - It
gives the model a chance to run before declaring a breakdown.
^{9}

Our alternative calculation is explained in
Online Table 2. It is an example of estimating the standard deviation from the
moving range of the data. This technique is part of constructing individuals
and moving range control charts in statistical process control.^{10}

What is the acceptable risk of falsely declaring a breakdown (a risk)? What is the acceptable risk of not detecting a breakdown (b risk)?

There are two basic risks in testing for a model breakdown: incorrectly declaring a breakdown and failing to detect one when it occurs.

The first kind of risk is called a risk for short. The second kind is called b risk. Both are expressed as probabilities between 0 and 1.

The calculations for our model breakdown
indicator require fixing values for a and b.
A reasonable range of values for a is 0 to 0.10. A reasonable range of values for b is 0 to 0.20.
For our example, we use a = 0.05 and b = 0.15.^{11}

The reasonable range for a is narrower than for b because in the context of this application, Acme considers a the higher risk. Acme bases this conclusion on:

- Its estimated cost of responding to a false alarm.
- Its belief that the present trouble with meeting orders has raised its awareness such that the next breakdown will not remain undetected for long.

How far of a sustained shift from zero
counts as a breakdown? A sustained shift of the residuals, our determinant of
model breakdown, is measured in terms of s_{e}. A useful rule in many
applications is to set k, the level of shift that signals a breakdown, between
0.25s_{e}
and 1.50s_{e}.

A shift less than 0.25s_{e} in
practice may be too small to represent serious damage to the model. If the
interest is in shifts larger than 1.50s_{e}, more basic control chart
techniques are usually appropriate.^{12} In our example, we use k =
0.75s_{e}.

### Putting all the pieces together

As explained in
more detail in Online Table 3, we put all the pieces together to construct the
model breakdown detector as follows for time period t and residual r_{t}:

- Compute
-[(k* s
_{e}) /2] - r_{t}and update

S(t)_{low}. - Compute
rt - [(k* s
_{e}) /2] and update

S(t)_{high}. - Update
ht and compare to S(t)
_{low}and S(t)_{high}. - Conclude
the model has broken down in a negative direction if S(t)
_{low}> h_{t}. - Conclude
the model has broken down in a positive direction if S(t)
_{high}> h_{t}.

In our example, there is evidence the model has broken down in a positive direction at time period 40.

### Aftermath

Ken used the model breakdown technique to determine when the sales forecast deterioration set in. If a model has broken down, the response depends on which of the three basic contexts described in the opening section applies.

In the first context, the appropriate response is to understand better how the world is changing before improving the model. In the second context, the first priority should be determining whether the shock represents a real event and not an error in the data-collection process.

In the third context, gradual breakdown, the problem-solving strategies for diagnosing out-of-control points on a control chart are a reasonable next step.

If more than one model is of concern, the preceding technique can be implemented in a computer program to check each one on a predetermined schedule and to flag any for investigation if a breakdown is threatening. This helps ensure model performance for our models.

### Editor’s Note

The column is based on a true story involving a real organization. Names and data have been changed to maintain the organization’s confidentiality.

### References and Notes

- The calculations in this article are carried out in Microsoft Excel.
- For
more information on cumulative sum control charts, see Douglas C. Montgomery,
*Introduction to Statistical Quality Control*, seventh edition, John Wiley and Sons, 2013. Because the emphasis in this column is on calculations that can be programmed, the V mask (graphical version) of the charts is not discussed. - Residuals
play a critical role in assessing the quality of statistical models. For more
information, see Norman R. Draper and Harry Smith’s
*Applied Regression Analysis,*third edition, John Wiley, 1998. The residuals also can be defined as easily as r_{i}= M_{i}- A_{i}. The important thing is to choose a definition and stay consistent throughout the discussion. - These
are (1) random variation around zero, (2) independence, (3) constant variance
and (4) close approximation to a normal distribution. Independence equates to
the present residual conveying no information about how the next one will
behave. Constant variance means the basic uncertainty of the model is the same
over the ranges of the predictor variables. Normal distribution is not a strict
requirement for a well-performing model but permits convenient and precise
expressions of the basic uncertainty. For more information, see Draper and
Smith’s
*Applied Regression Analysis*, noted in reference 3. - Dummy
variables are a common way to incorporate qualitative variables into models.
For more information, see Draper and Smith’s
*Applied Regression Analysis*, noted in reference 3. - The calculations for the model coefficients were carried out with more precision than that displayed in Online Table 1 and elsewhere.
- For
a more detailed discussion of the standard error of prediction, including how
to calculate it, see Draper and Smith’s
*Applied Regression Analysis*, noted in reference 3. - This
is a concept borrowed from cross validation. For a more detailed discussion of
cross validation, see section 4.2 of Raymond H. Myers’
*Classical and Modern Regression With Applications,*second edition, Duxbury, 1990. - If we test for breakdown starting immediately after the period originally used to fit the model, this principle states to give the model the benefit of the doubt when it begins.
- For more information on individuals and
moving range control charts, see section 6.4 in Montgomery’s
*Introduction to Statistical Quality Control*, noted in reference 2. - The levels of a, b and k should be set based on the importance of the model being checked. Consulting an experienced statistical practitioner is highly recommended. The check for model breakdown can be run multiple times with varying combinations of a, b and k to see whether any robust insights emerge across iterations.
- See especially chapter 6 of Montgomery’s
*Introduction to Statistical Quality Control,*noted in reference 2.

**Joseph D. Conklin** is a mathematical statistician in
Washington, D.C. He earned a master’s degree in statistics from Virginia Tech
in Blacksburg and is a senior member of ASQ. Conklin is also an ASQ-certified
quality manager, engineer, auditor, reliability engineer and Six Sigma Black Belt.

Featured advertisers