## 2019

**Implementing Multivariate Statistical Process Control Using
Hotelling's T ^{2} Statistic**

*by *Robert L. Mason and John C. Young

Multivariate statistical process control (MVSPC) can be defined as the application of multivariate statistical procedures for the purpose of increasing the quality and productivity of a business. These techniques have found application in many areas of both the service and manufacturing industries. For example, MVSPC has been used in the monitoring of manufacturing processes, patient care, disease outbreak and customer satisfaction.

One of the most popular multivariate control procedures
is based on Hotelling's T^{2} statistic, the multivariate
analogue of the univariate Shewhart statistic.^{1}
The T^{2} statistic allows you to monitor many process variables
because it considers them as a simultaneous group of items that
interact with one another.

A control procedure based on the T^{2} statistic
takes note of the fact that a change in one variable can cause a
rippling effect throughout an entire system. Because it considers
the interrelationships among the variables, the T^{2} statistic
produces a powerful tool that is useful in detecting subtle system
changes.

Many ask how to apply the T^{2} statistic
when implementing a full-scale MVSPC. In particular, users want
to know where the control procedure should be applied in the process.
These questions, among other issues, are the topics of this article.

**Planning stage
**Implementation of a univariate Shewhart control procedure
is straightforward. After the variable to be charted is selected,
a preliminary data set consisting of observations from an in-control
process is obtained. The major purpose of this data set is to provide
estimates of the mean and standard deviation of the charting variable.
These estimates are used in establishing a preliminary control procedure
that is further used to clean the data of atypical observations.

When this purging of bad observations is accomplished, the resulting data set, labeled the historical data set (HDS), is used to obtain the sample estimates of the mean and standard deviation needed for establishing the control procedure. Thus, new observations are monitored using a control procedure based on these estimates. The construction of the HDS is referred to as a Phase I operation, and the monitoring of new observations is termed a Phase II operation.

Due
to the many variables involved in a multivariate process, a Phase
I operation for a T^{2} statistic involves several more
steps than one for a univariate Shewhart procedure. In general,
processes have three components: input, processing and output. This
is depicted for a hospital system in
Figure 1.

Many variables are associated with each process component. In turn, the practitioner must determine where to locate the control procedure. We suggest the area where problems may exist or occur with serious consequences. For example, a control procedure for an industrial process might be established to detect inconsistencies in feeding stock, tracking movement of the processing variables and maintaining the quality of production on the output component.

The steps involved in the planning stage of a Phase I operation are listed in Figure 2. They involve establishing goals, studying and mapping the process, and obtaining information on the variable relationships. The personnel who actually operate the system are excellent sources for this information.

**Data collection stage
**Once the plan is completed, it's time to evaluate the
preliminary data set. This stage consists of verifying the data
quality by examining for either human or electronic data errors.

This can be accomplished most easily by utilizing the graphical tools of a statistical computer package. Outlying data points can be identified and, if necessary, removed, and relationships between variables can be more carefully examined.

In order to achieve a better linear relationship among the variables, it may be necessary to re-express some variables in more appropriate forms such as logarithms or power functions. Theoretical knowledge of these relationships is helpful in this effort, but if such information is unavailable or does not exist, then decisions should be based on the empirical evidence provided by the HDS.

Any remaining problems also must be considered and addressed. When missing information, for example, one might substitute an estimate for the missing components, or simply delete the data containing the missing information. The operations involved in the data collection stage are presented in Figure 3.

**Detecting data problems
**The next stage in implementing a multivariate control
procedure consists of detecting data problems. Unlike the previously
mentioned data collection problems, these data problems can affect
the use and performance of the T

^{2}statistic and must be thoroughly investigated.

The T^{2} statistic for an observation on
p-variables, such as X' = (x_{1}, x_{2},...,x_{p}),
is given as T^{2} = (X - X)'S^{-1} (X - X),
where the sample mean vector X represents a measure of the
process center.

The sample covariance matrix S provides information on individual variable variation and on the correlation between the components of the observation vector. As in univariate control, both estimates are obtained from the preliminary data.

The use of the T^{2} statistic requires that
the covariance estimate S contain no exact redundancies among the
process variables. This can occur when two variables are perfectly
(or near perfectly) correlated. A data redundancy can usually be
removed by deleting one of the variables from the study. Several
software packages, such as QualStat and SAS, which are used in MVSPC,
offer procedures for locating and resolving these problems.

The use of T^{2} as a control statistic for
MVSPC requires the observations to be independent. In most applications,
this requirement is easily satisfied. However, in certain industrial
applications a time dependency may exist between the observations.
This is usually labeled as a form of autocorrelation.

Numerous statistical procedures are available for detecting autocorrelation. Time-sequence plots, as presented in Figure 4, are one such popular tool.

Detection
of autocorrelation does not prohibit the use of the T^{2}
control procedure. One must simply adjust the data for the presence
of the dependency.^{2} The problems associated
with detecting data problems are summarized in Figure
5.

**Outliers
**With the investigation and resolution of all data problems,
the user is ready to purge the preliminary data set of statistical
outliers. An outlier is an observation that is far removed from
the bulk of the data.

The T^{2} statistic is a measure of the (squared)
statistical distance that the observation vector is from the sample
mean vector. This distance is computed relative to the variable
relationships or scatter of the points as given by the covariance
matrix S.

Like straight line or Euclidean distance, the T^{2}
statistic is univariate. Observations with large T^{2} values
are potential outliers since the implication is that the observation
is located at a great statistical distance from the data center.

To determine what is a large distance, use the probability
function that describes the random behavior of the T^{2}
statistic. The purging procedure consists of calculating the T^{2}
value for each observation and comparing it to a critical distance
value, labeled the upper control limit (UCL). Observations with
a T^{2} > UCL are removed after investigation for cause;
otherwise they are retained.

The process is continued until a homogeneous data
set is obtained. This data set becomes the HDS and provides the
estimates of X and S to be used to construct the T^{2}
control statistic for monitoring future observations.

**Final implementation
**Figure
6 summarizes the steps involved in constructing a historical
data set. In general, the analytical part of implementation can
be handled with software packages such as QualStat and SAS. When
these steps are completed and the HDS is constructed, new observations
can begin to be monitored. This is the start of the Phase II operation.

The corresponding T^{2} control procedure
in this phase of operations is based on a different UCL than that
used in the Phase I operation. The UCL value for a Phase I operation
is based on a beta distribution whereas the UCL value for a Phase
II operation is based on an F-distribution. Otherwise, the two procedures
are very similar.

After
the T^{2} values for the new observations are computed,
they are compared to this new UCL. A signal is declared for an observation
when the value exceeds the UCL. Results are exhibited in a T^{2}
chart such as the one presented in Figure
7.

Because of the inherent complexity of multivariate
data, implementing a multivariate control system based on a T^{2}
statistic is more complicated than initiating a univariate control
system. The benefits, however, far exceed the additional effort.
Using the T^{2} statistic with MVSPC not only allows for
the monitoring of individual variables, but also provides an excellent
technique for determining when the relationships between variables
are fouled. This has led to a multitude of successful applications
of this methodology in many different industries.

NOTES

QualStat is a product of InControl Technologies Inc.

SAS and JMP are products of SAS Institute Inc.

REFERENCES

1. Robert L. Mason and John C. Young, "Why Multivariate Statistical Process Control?" Quality Progress, December 1998, pp. 88-93.

2. Robert L. Mason and John C. Young,
"Improving the Sensitivity of the T^{2} Statistic in Multivariate
Process Control," Journal of Quality Technology, Vol. 31, No. 2,
pp. 155-165.

**ROBERT L. MASON*** is a staff analyst in the
statistical analysis group at Southwest Research Institute in San
Antonio. He earned a doctorate in statistics from Southern Methodist
University in Dallas. Mason is an ASQ Fellow.*

**JOHN C. YOUNG*** is president of InControl
Technologies Inc. in Houston and a statistics professor at McNeese
State University in Lake Charles, LA. He earned a doctorate in statistics
from Southern Methodist University in Dallas.*

Featured advertisers