## 2020

STATISTICS ROUNDTABLE

# It Depends

## The effect of dependent observations on process control

by Robert L. Mason and John C. Young

A shewhart chart is an excellent tool for detecting abrupt process changes. One of its additional properties is its ability to detect small process changes through the use of run rules.1

For example, observing a run of seven or eight consecutive observations of the plotted statistic above or below the centerline of the chart would constitute a signal. Runs can be caused by a change in the process, but they can also be generated by data dependencies.

In fact, if you are working with data with a dependency expressed in the form of autocorrelation, the expectation is high that you will encounter substantial runs of the plotted statistic above and below the centerline.

Figure 1 is a Shewhart chart of the means of 568 subgroups of size 2. The centerline is at the target mean value of 0.597, and the population standard deviation is 0.368. This is a plot of an industrial set of data that is highly autocorrelated.

Observe the extended run below the centerline in the first part of the chart and the extended run above the centerline toward the end of the chart. The relatively large amount of variation observed in the subgroup means is common to this type of process variable, and much of it is induced by its relationships with other process variables.

### EWMA and CUSUM

Two other popular univariate control procedures are the exponentially weighted moving average (EWMA) and the cumulative sum (CUSUM) charts.2 The CUSUM statistic is constructed using information on the current observation as well as on all past observations. These observations are usually assumed to be independent normal variates.

The chart statistic accumulates information in the form of a deviation of each observation from a target value. The value of the CUSUM statistic is compared to calculated critical limits that change as sampling progresses. When the process shifts from the target value, the deviations from the target become large. The accumulation of these larger deviations leads to a signal indicating a process shift.

Similar to the CUSUM statistic, the EWMA statistic accumulates information from all previous observations. However, it uses a weighting factor (λ) that assigns less weight to older observations than to current observations. The variable mean provides the initial condition for the charting procedure.

The plotted EWMA statistic will vary at random about this value until a sustained shift occurs. When a shift occurs, the statistic will move to the shifted mean. The speed of movement depends on the assigned weight factor. The EWMA statistic is compared to calculated control limits that stabilize as sampling progresses. A basic assumption is that the observations are independent and normally distributed.

### Relying on memory?

σMajor differences exist among the three charting procedures. The Shewhart procedure without run rules has no memory, and thus does not rely on past observations. It is highly sensitive to large changes (±3σ or greater) in the mean of the distribution of the monitored variable, although the addition of run rules enhances the sensitivity of the procedure to small shifts in location.

In contrast, both the CUSUM and EWMA statistics have memory with regards to past information, and both, by design, are very sensitive to small changes (±0.5σ to ±1.5σ) in the location of the monitored distribution.

All three procedures require that the observations be independently and identically distributed and that the data contain only random variation. This type of variation is common to certain industries such as the fixed part manufacturing industry. For these industries, extraneous sources of variation are attributed to assignable causes and correctly interpreted as a signal.

However, in the processing industry, data dependencies are created by load changes, ramping or demand factors. These dependencies might be mistakenly interpreted as assignable causes even though they are an inherent part of the typical process variation. With this type of data, the EWMA and the CUSUM statistics, because of their memories, are more likely than the Shewhart procedure to signal in the presence of data dependencies.

As an example, we reconsider the data used in the plot in Figure 1 and construct a EWMA chart using the same subgroup means. The chosen weighting factor is

λ = 0.1. The resulting EWMA plot is presented in Figure 2.

Notice the occurrence of the same pattern of extended runs above and below the mean line as was seen in Figure 1, indicating the effects of autocorrelation. In addition, notice the large number of signals in this chart relative to the small number of signals given in the Shewhart chart in Figure 1. A CUSUM chart would yield a similar result.

The cause of this increase in signals is due to the effects of the autocorrelation on the memory components of the EWMA and CUSUM statistic. In fixed part industries, these increased signals would be an indicator that autocorrelation is an assignable cause and needs to be mitigated. However, in processing industries in which such autocorrelation is inherent to the process, these excess signals would be indicators of false alarms.

### Small mean shifts

The multivariate exponentially weighted moving average (MEWMA) statistic3 and the multivariate cumulative sum (MCUSUM) statistic4 are the multivariate extensions of their univariate counterparts described earlier. As in the univariate case, attention is focused on detecting small mean shifts among the process variables. Also, both statistics are based on the accumulation of information from past observation vectors.

The use of the MEWMA and MCUSUM statistics is based on the assumption the observations are independently and identically distributed. Similar to their univariate counterparts, certain factors such as data trends, step changes and autocorrelation will induce multiple signals in control charts based on these statistics due to these data dependencies.

However, in data collected on processes used in a processing industry, the memory property (which works for these statistics in producing a signal by detecting when the observations are not independently and identically distributed) works against the statistics when these data characteristics are an inherent part of the process.

This result occurs because these two procedures were not designed to filter variation from extraneous sources. To stress the importance of this point, we examine the performance of the MEWMA statistic (labeled Tt2) for a two-variable process in which the data on each variable are autocorrelated. The plot of the statistic is presented in Figure 3 (p. 72) for various values of the weighting value, λ (0.2, 0.6 and 0.8).

Observe that large values of the statistic occur in all three of the stacked figures due to the autocorrelation in the data. Also note that for corresponding points across the three graphs, the value of the statistic decreases as the value of λ increases.

As λ approaches 1, the value of the MEWMA statistic, as well as the number of signals, decreases. For λ = 0.2, the MEWMA statistic indicates that 71.3% of the observations are signals. For λ = 0.6 and 0.8, the percentages drop to 21.9% and 3.1%, respectively. This indicates that the sensitivity of the MEWMA statistic to data dependencies, such as autocorrrelated data, is controlled by the choice of the weight factor.

Sensitivity increases as λ approaches 0 and decreases as λ approaches 1. Perhaps the MEWMA statistic could be made less sensitive to these data characteristics by increasing the value of the weight factor. However, as λ approaches 1, the performance of the MEWMA becomes similar to that of Hotelling's T2 statistic,5 which is the multivariate counterpart to the standardized Shewhart statistic. The two are equal for λ = 1.

### The T2 statistic

There are numerous ways to describe the T2 statistic. A common definition is that the statistic is the squared statistical distance from the observation vector to the population mean vector, or to the sample mean vector, relative to the known or estimated covariance matrix. Unlike the MEWMA and the MCUSUM statistics, the T2 statistic has no memory.

Large T2 values are produced when the difference between the observation vector and the sample mean exceeds the tolerance on individual components that is established by the sample covariance matrix. Large values of the T2 statistic also occur when the relationships between two components of the mean-adjusted observation vector do not match the covariance structure given in the sample.

The effects of inherent data dependencies, such as trends, load changes and autocorrelation on the signal detection ability of the T2 statistic, are minimal relative to their effects on the MEWMA and MCUSUM procedures. This occurs due to the lack of a memory component in the T2 statistic.

We illustrate this by examining a T2 control chart based on the same data used in the MEWMA plots (for λ = 1) given in Figure 3. The T2 chart is presented in Figure 4. Notice the absence of large values of the T2 statistic relative to those seen in Figure 3 for the MEWMA statistic with

λ < 1. Observe also that the T2 statistic detects three signals representing only 0.6% of the observations, which is well within what is expected when α = 0.001 and the process is in control.

### Keeping the assumption

Because the assumption that observations are independent and identically distributed is fundamental to most control procedures, we previously presented methods for detecting data dependency among the observations of either a univariate or multivariate process.6, 7 We now have shown that when such dependencies are an inherent part of the process, violation of this assumption can greatly reduce the effectiveness of many popular control procedures.

### References

- D.C. Montgomery, Introduction to Statistical Quality Control, fifth edition, John Wiley and Sons, 2005.
- T.P. Ryan, Statistical Methods for Quality Improvement, second edition, John Wiley and Sons, 2000.
- Montgomery, Introduction to Statistical Quality Control, see reference 1.
- Ryan, Statistical Methods for Quality Improvement, see reference 2.
- Robert L. Mason and John C. Young, Multivariate Statistical Process Control With Industrial Applications, ASA-SIAM, 2002.
- Robert L. Mason and John C. Young, "Dependent Univariate Observations and Statistical Control," Quality Progress, April 2007, pp. 62-64.
- Robert L. Mason and John C. Young, "Detecting Dependent Observations in Multivariate Statistical Process Control," Quality Progress, September 2007, pp. 56-58.

**ROBERT L. MASON** is an institute analyst at Southwest Research Institute in San Antonio. He received a doctorate in statistics from Southern Methodist University in Dallas and is a fellow of both ASQ and the American Statistical Assn.

**JOHN C. YOUNG** is a retired professor of statistics from McNeese State University in Lake Charles, LA. He received a doctorate in statistics from Southern Methodist University.

Featured advertisers