by Robert L. Mason and John C. Young
Statistical thinking has helped industry become more aware of the benefits of using statistical procedures.1 By definition, statistical thinking is a philosophy of learning and action based on the principles that:
- All work occurs in a system of interconnected processes.
- Variation exists in all processes.
- Understanding and reducing process variation are keys to success.
The emphasis of this viewpoint is on reducing
system variation. The common approach is to identify the key
process variables associated with variance reduction. Then, after
analysis and study, recommendations on how to reduce variation in
these variables can be made. The interrelationships between a
process and the component parts of statistical thinking are
illustrated in Figure 1.
Although it is clear statistical thinking is a vital component of improving processes, we believe it would be useful to expand this concept. Humans are limited in their ability to process, understand and monitor many variables at one time.
Veteran process operators often claim that even with many years of experience, our upper limit is somewhere around three variables. This shortcoming has provided the impetus for process engineers and data analysts to study and use multivariate procedures. It also suggests combining statistical thinking and multivariate analysis procedures may eventually lead to more creative and innovative techniques for improving quality.
The Ripple Effect
Due to technological achievements made with computers, semiconductors, sensors and data storage devices, our ability to collect data has exploded. We can accumulate and maintain data files on several related variables—from our checkbooks to our workplaces. Industrial complexes regularly use advanced process control and distributed control systems to collect data simultaneously on all components of the industrial process, including input variables, process variables and output variables.
To transform these available data to information and maximize our understanding of them, we must expand our thinking to include multiple dimensions and to examine how multiple variables are interrelated. This is the basis of multivariate thinking.
Multivariate thinking is a subset of
statistical thinking that examines relationships between and
among process variables. In a multivariate process, a change in
one variable can create a ripple effect among the other
variables. For example, an increase in temperature increases the
pressure. The component parts of multivariate thinking and how
they relate to a process are presented in Figure 2.
With this philosophy, a person can determine the key variables contributing to the overall system variation, as well as the contribution each variable makes to the variation of any other variable. Many times, the variation in a particular variable is induced by other variables. For example, in a steam turbine operation, fuel usage is induced by demand.
Variance reduction in a particular variable can be examined by determining the other variables contributing to the variation. Using this approach, you would study process variation from a multivariable correlation perspective.
We do not believe the definition of statistical thinking excludes the concepts of multivariate thinking, but multivariate methods and multivariate thinking skills have not been strongly emphasized in the application of the statistical thinking philosophy. We would like to draw attention to these methods and the benefits derived from their use.
There are numerous benefits of multivariate
thinking and multivariate data analysis. Consider a comparison
between a bivariate statistical process control procedure using
Hotelling’s T2 statistic and the two corresponding
univariate Shewhart control procedures. The Shewhart control
charts are obtained when the two variables are treated as
independent variables, while the T2 control region is influenced
by the correlation between the two variables. The different
procedures are represented in Figure 3. The ellipse represents
the bivariate T2 control region, which is superimposed on the two
univariate Shewhart charts represented by the box.
Note point A in the plot is outside the Shewhart control limits for x2 and is outside the T2 control region. Thus, the corresponding observation is judged to be out of control by both procedures.
In contrast, the observation corresponding to point B is judged to be in control if you use the Shewhart procedures for monitoring the two variables, x1and x2. This occurs because the values of both variables fall within the control range of their respective univariate Shewhart charts. However, this point is outside the ellipse and is judged to be out of control because the relationship between the two variables is fouled.
Also, point C in the plot would be considered to be in control using the T2 control region but out of control using the univariate Shewhart regions.
Data analysts use different types of
multivariate techniques.2 This is primarily due to the
availability of inexpensive computers and software programs, many
of which contain multivariate graphing tools. Figures 4 and 5
illustrate some of these graphic capabilities, including a
correlated bivariate normal distribution and a 3-D plot of two histograms. Statistical software programs that
contain the needed tools for conducting different types of
multivariate analyses have also been developed.
Multivariate methods are generally equal or superior to univariate procedures in separating a signal (change) from the noise (variation). They treat all variables simultaneously and can extract information on how the variables are changing relative to one another. Multivariate techniques can also be used to determine how the variables change individually.
Because data collection techniques have developed faster than multivariate analysis techniques, data are only occasionally analyzed using these well-developed procedures. Many multivariate techniques have been available for more than 60 years. For example, principal component analysis was developed in the mid-1930s, while multivariate control procedures were used in the 1940s. Multiple regression procedures based on many variables have been around even longer.
Despite these developments, multivariate thinking and the application of multivariate techniques have generally not been applied in many industry and business situations. It is difficult for people to overcome the traditional one-variable-at-a-time viewpoint because it is still taught in many of our universities at both the undergraduate and graduate levels. This promotes a lack of confidence in many of the multivariate methods because they cannot be simplified like those used in the analysis of a single variable.
This lack of training has had the most significant impact on the nonuse of multivariate methods. Who manages a business or an industrial complex? Generally, it is an individual with a bachelor’s or master’s degree in business or finance, though in industry the CEO often is a physical scientist or an engineer. The backgrounds and workloads of these leaders provide little opportunity for them to obtain training in multivariate thinking and the application of multivariate techniques.
Another reason for the lack of popularity of multivariate methods is their complexity. Most multivariate methods are not easy to understand, and without pro-per training, they can be difficult to apply.
An example of this problem is illustrated in
Figure 6, which shows scatter plots of four different sets of
bivariate data, each with the same correlation coefficient of 0.7
between the two variables. As illustrated by the different trends
in the plots, using a pairwise correlation coefficient without
graphically investigating the true relationship between the two
variables can be misleading.
Overcoming These Challenges
The development of more user friendly software could ease some of these problems. However, if the use of multivariate thinking and the application of multivariate techniques are to become commonplace in industry, then we must direct our energy toward overcoming these challenges. Some ways to do this include:
- Becoming aware of the benefits of multivariate thinking, one of the most important of which is that it allows you to see how the many variables of a system or process fit together.
- Asking data analysts to give presentations on multivariate thinking to all who will listen.
- Asking universities to upgrade their course offerings in applied multivariate analysis at the undergraduate level to nonscience majors. Most college curricula require a basic course in statistics. If and when possible, this requirement should be expanded to include a basic course in applied multivariate methods.
- Finding better ways to present multivariate results. Many em-ployees have a fear of multivariate methods because of the misconception that the procedures are difficult to understand and apply. Those working in statistics need to learn to present multivariate results in a format that is easy to understand.
Multivariate thinking includes thinking in multiple dimensions and examining how multiple variables fit together. The greatest benefit of multivariate thinking and the application of multivariate methods is that they allow one to understand how variables are interconnected and interrelated. This important knowledge is well worth the effort needed to apply multivariate techniques and will ultimately lead to a better understanding of process variation and how it can be reduced.
- Roger Hoerl and Ronald D. Snee, Statistical Thinking: Improving Business Performance, Duxbury, 2002.
- R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, fourth edition, Prentice Hall, 1998.
- Mason, R.L., and J.C. Young, Multivariate Statistical Process Control With Industrial Applications, ASA-SIAM, 2002.
ROBERT L. MASON is an institute analyst at Southwest Research Institute in San Antonio, TX. He received a doctorate in statistics from Southern Methodist University and is a Fellow of both ASQ and the American Statistical Assn.
JOHN C. YOUNG is president of InControl Technologies and a professor of statistics at McNeese State University in Lake Charles, LA. He received a doctorate in statistics from Southern Methodist University.