3.4 PER MILLION
When Your Process Has Runs, Trends and Cycles
by Joseph D. Conklin
As a Six Sigma practitioner, you
sometimes work with processes that have memory, in which the value observed at some
earlier time partly influences or determines the current value. When process data
are plotted in time order, the appearance of runs, trends and cycles is evidence
of memory. (See Figures 1, 2 and 3 for basic examples of each.)
It is appropriate to use time series methods for these kinds of data, but before doing so, you should see if the time correlations are a natural part of the process or the result of sampling too often. If the sampling frequency is much shorter than the minimum time the process needs to change, you risk introducing correlations unnecessarily.
An alternative to using time series methods is to lengthen the time between observations. If the correlations are the result of too frequent sampling, widening the time between observations will eventually reduce or eliminate them. Then more basic statistical methods that don’t assume correlations may be used for analysis, prediction and control.
You should consider using time series methods when the correlations are natural to the process or when the sampling frequency can’t be reduced. There are many methods to choose from, the most common being:
- Moving average: averages a predetermined number of consecutive process values to produce the next result for analysis or prediction purposes.
- Autoregressive: makes the current values of the process a function of earlier ones for the purposes of analysis and prediction.1 It’s similar to linear regression.
- Exponential smoothing: the same as a moving average, except the weight each value receives in the average declines by some exponential function the further back in time a value is.2
- Fourier transform: tries to express the process values as the sum of trigonometric functions that have a cyclic pattern, such as sine and cosine functions.3
Autoregressive Method Example
To understand the autoregressive method, consider the data in Table 1, where the values are company sales figures gathered over 36 consecutive months. The sales are partly subject to medium and long-term economic forces whose effects tend to persist from one month to the next, so it is reasonable to expect the data to show time correlations.
To decide which time series method
to apply, you should create a time series plot of the data (see Figure 4). This
particular plot suggests a combination of two patterns: a cycle with an upward trend.
(Compare Figure 4 to the illustration of this combination in Figure 5.) The upward
trend persists through the entire 36 months, and the cyclic pattern around the trend
appears to run in approximately 12-month intervals.
to perform the autoregressive method, take a look at Table 2. It will help you build
a linear regression equation. The dependent variable (the y in the linear regression
equation) is in column six (current month’s sales). The independent variables
(the x’s in the linear regression equation) are in columns three, four, five
The dummy variables are constructed by dividing the 12-month interval for a cycle into four quarters. With 35 values available for the linear regression equation, there are not enough data to construct dummy variables based on individual months or weeks.4
For the sales data, quarter one corresponds to the first three months of each 12-month cycle or months 1, 2, 3, 13, 14, 15, 25, 26 and 27. The other quarters correspond to these months:
- Quarter two: 4, 5, 6, 16, 17, 18, 28, 29 and 30.
- Quarter three: 7, 8, 9, 19, 20, 21, 31, 32 and 33.
- Quarter four: 10, 11, 12, 22, 23, 24, 34, 35 and 36.
Linear Regression Equation
In the regression equation, the dummy variables have two possible values: 0 or 1. When a particular month belongs to quarter one, the value of the dummy variable for quarter one (D1) is 1. In all other months, the value is 0. The same pattern holds true for the dummy variables for quarters two and three (D2 and D3, respectively). Three dummy variables are enough to represent four quarters. Quarter four corresponds to the months in which D1, D2 and D3 are simultaneously 0. Table 2 shows the full layout for the dummy variables.
To understand the linear regression
equation, take a look at Table 3 (p. 68), which edits and adapts the results generated
by the regression module in the Microsoft Excel Analysis Tool Pak for Windows 2000.
The adjusted R square measures how well the linear regression fits the sales data.
The lowest possible value is 0, and the closer the R square value is to 1, the better
the overall fit. With an adjusted R square of 0.998, the overall fit appears to
The analysis of variance (ANOVA) part of Table 3 acts as another check on the suitability of the equation. Because it is possible to obtain an excellent fit by luck, the ANOVA table helps measure the degree to which the fit appears to be real and not merely coincidental.5 The key column to look at is significance F. The possible values for this column are between 0 and 1, and values close to 0 are evidence the fit is real. The value for this table is < 0.0001, which is strong evidence the excellent fit suggested by the adjusted R square is real.
The middle section of Table 3 shows the coefficients in the linear regression equation for sales. The P-value column serves the same purpose as the significance F column. The possible values for the P-value column are also between 0 and 1. With all the values for this column being < 0.0001, there’s strong evidence all the coefficients listed make a real contribution to the fit and should be included in the equation.
The last section of Table 3 shows the linear regression equation for sales. Including the prior month’s sales as an independent variable is what makes the equation an example of the autoregressive time series method because the current value for sales becomes a function of an earlier one.
Table 4 shows the predicted values for sales next to the actual ones. It also shows the difference between the actual and the predicted values or residuals.
To evaluate the quality of the equation, it is helpful to look at plots of the residuals (see Figures 6 and 7).6 Both plots show a random scattering of points and are signs of a good equation. If the plots of the residuals showed some kind of pattern, the equation would need to be improved.
What To Do With A Strong Equation
The linear regression equation can be used to predict future sales. As long as the predictions reasonably match reality, the company is in a good position to manage its business. If the results begin to diverge substantially, however, the company should investigate whether the forces affecting sales have begun to change.
The exponential smoothing method could also have been used in this example. That’s why it’s important for Six Sigma practitioners to learn about as many time series methods as possible.
REFERENCES AND NOTES
- For more information on moving average and autoregressive time series methods, see Time Series Analysis: Forecasting and Control, third edition, by George Box, Gwilym M. Jenkins and Gregory C. Reinsel (Prentice Hall, 1994).
- For more information on exponential smoothing, see Forecasting and Time Series Analysis, second edition, by Douglas C. Montgomery, Lynwood A. Johnson and John S. Gardiner (McGraw-Hill, 1990).
- For more information on the Fourier transform, see The Fourier Transform and Its Applications, third edition, by Ronald N. Bracewell (McGraw-Hill, 2000).
- For more information on linear regression analysis, see Applied Regression Analysis, third edition, by Norman R. Draper and Harry Smith (John Wiley, 1998). A useful rule of thumb in building a linear regression equation is to have at least five to 10 data values for every independent variable. There are 35 data points for the linear equation instead of 36 because there is no prior month value for month one.
- For more information on ANOVA tables, see Design and Analysis of Experiments, fifth edition, by Douglas C. Montgomery (John Wiley, 2001).
- Applied Regression Analysis (see reference 4) contains more information on analysis of residuals.
JOSEPH D. CONKLIN is a mathematical statistician at the U.S. Department of Energy in Washington, DC. He earned a master’s degree in statistics from Virginia Tech and is a Senior Member of ASQ. Conklin is also an ASQ certified quality manager, quality engineer, quality auditor and reliability engineer.