## 2019

STATISTICS ROUNDTABLE

# Monitor Your Industrial Processes

**by Robert L. Mason and John C. Young**

Models are often developed in industry to
characterize and explain a process because they can show how
process variables are interconnected and interrelated.
Historically, two particular methods have been used to construct
models to characterize an industrial process. Each method not
only uses different procedures to determine the model
coefficients, but also uses the resultant models in different
ways (see Table 1).

### Theoretical Models

The first model building method is derived from the mathematical theory and physical laws that govern the process. This type of model is primarily used in the design and construction of the processing unit. It also can be used as a performance model in certain situations.

Theoretical models, although correctly based on underlying mathematical principles, often do not perform well in applications. This is because the higher level mathematics used to develop them often must be oversimplified in application and thus may not account for a substantial amount of variation in the data. And, unless individually corrected, such models also fail to account for the individual idiosyncrasies of a processing unit.

### Empirical/Regression Models

The second approach to model building is data
based and uses statistical procedures. With this approach, a
model is empirically fit using a baseline data set that
represents good operations for the processing unit. The simplest
version of this model is linear in form and is obtained using
common regression techniques.^{1}

The regression approach requires knowledge of statistical methods and experimental design. Regression models are appealing because little process information is required in the development and implementation stages. In many process situations, regression models outperform theoretically designed models and, unlike their mathematical counterparts, allow for adjustments for the inherent differences that exist between supposedly identical units.

### Simple Regression Model

The simplest form of a regression model, used
to relate a response variable, y, to a set of predictor
variables, p, is shown as y = βo + β1^{x}1 + … +
βp^{x}p + ε (equation one).

In this equation, x1, x2, .., xp represent the predictor variables, and β1, β2, …, βp represent their respective unknown coefficients. The intercept is represented by βo, and the error is denoted by ε.

Good regression models depend on the strength of the linear relationships between the response and predictor variables and on the availability of a good historical or baseline data set. The stronger the linear relationships, the better the model.

A major side benefit of the regression model approach is that it allows the user to explain the variation of the response variable in terms of the predictors. This is useful when you are studying how a quality or output variable is related to a set of process variables. If the regression is good, you can account for the variation in the response variable by examining the predictor variables.

Consider a process in which electricity, measured by megawatt (mw) production, is being generated as a source of power for industrial use. High energy steam is produced in a boiler system and used to run a turbine generator. The high temperature, high pressure steam turns the turbine generator and produces the electricity. Latent steam from the turbine is changed back into water in a condenser and is returned to the boiler.

The mw production of the turbine is modeled
using steam flow (stmfl), steam temperature (stmtp), condenser
temperature (cdtp) and absolute pressure of the condenser (cdpr)
as the predictor variables. Using a baseline set of data
collected during a period of good operations, an estimate of the
regression model in equation one is obtained using regression
techniques^{1} and is given by mw = -21.09 + 0.113 stmfl + 0.029
stmtp – 0.036 cdtp – 0.330 cdpr (equation
two).

The squared correlation coefficient for this
equation is R^{2} = 0.991. This value indicates 99.1% of the
variation in mw production can be accounted for by examining the
four production variables included in equation two. Further
analysis indicates steam flow is the most critical production
variable, followed by steam temperature. Next in order of
importance are the condenser temperature and condenser pressure
variables. Determining the most important variables that
contribute to the variation gives you an initial place to look
when there are disruptions in the process.

### Another Use

A regression model, such as the one in equation two, also can be used to predict values of the response variable. Being able to predict with a high degree of accuracy a future value of an important process variable is useful when monitoring a process. Many applications that involve prediction actually depend on the size of the error made in the prediction. This particular type of error is referred to as the residual error and is defined as the difference between the actual observed value of the response variable, ya, and the predicted value of the variable, yp, as obtained from the regression equation.

The residual error for a given sample point
with coordinates (xa, ya) for a straight line regression equation
is depicted in Figure 1. Note the error is measured
vertically—parallel to the y-axis, not perpendicular to it.
A small residual value implies the regression equation provides a
good prediction. In other words, the predicted value is close to
the actual value. A large residual indicates poor prediction and,
in some situations, indicates a process upset.

The residuals from an estimated regression
model provide an excellent multivariate control chart^{3} statistic.
The associated charting procedure is simple to construct and
implement. For example, reconsider the steam turbine example in
which the fuel used in producing the steam in the boiler is now
being monitored by examining the residual error of an estimated
input-output (I/O) model for the steam turbine system. The I/O
model for the boiler is given by fuel = βo + β1 stmfl +
β2 stmtp + β3 cdtp + β4 cdpr + ε (equation
three).

Using a good baseline data set, an estimate of the model in equation three is given by fuel = -86.29 + 1.22 stmfl + 0.08 stmtp + 0.37 cdtp – 0.01 cdpr (equation four).

Note the estimated model expresses the input fuel to the system as a function of the output steam production as measured by steam flow and steam temperature. The condenser also is considered an integral part of the system because it is used to return the water to the boiler. Thus, the condenser temperature and the absolute pressure of the condenser also are included as predictor variables in equation three.

A typical residual error plot of the fuel usage
for steam production for this control chart example is presented
in Figure 2. The residual errors plotted in the graph are
obtained for each observation by taking the difference between
the actual observed fuel value and the fuel amount predicted by
the estimated I/O model given in equation four. The
R^{2} statistic for this
regression equation is 99.3%. It indicates equation four provides
an excellent fit to the data because it explains 99.3% of the
fuel variation.

The estimated standard error of
prediction—the average amount of error contained in each
prediction for this equation—is 8.29. This is a small,
single digit error value relative to the fuel values measured in
the hundreds of units per hour. The control limits for the
plotted residual errors in Figure 2 are established at ±
three standard errors or ± 3(8.29).

The I/O regression in equation four provides an excellent estimate of the relationship between fuel usage and the predictor variables given by mw production, steam production, steam temperature, condenser temperature and condenser pressure. When the residuals vary between the specified upper and lower control limits, you can conclude the run conditions of the unit match the good operation conditions of the baseline data. If a residual value lies outside the limits, you can conclude an upset condition has occurred in the process.

The logic behind this charting procedure is
simple. If the present operational conditions match the baseline
and you have a good model as judged by the R^{2} value and the size of the standard
error of prediction, then the regression model will accurately
predict the fuel used in steam production with a small residual
error. Thus, a large residual (in absolute value) that plots
outside the control limits will result when the operating
conditions differ radically from the baseline conditions. A group
of successive small residual values, denoted by a sequence of
residuals with the same sign either above or below the mean line
of zero, will also result when minor upset conditions are
present.

The methodology described in this column depends on your ability to construct good regression models using baseline data. If this is not possible, the residuals may be too variable to detect trend changes. To ensure a go od fit, obtain a stable historical data set that is not contaminated by outliers, missing data or strong linear relationships between subsets of the predictor variables. It is also useful to choose important input variables that reflect the process operations.

### REFERENCES

- T.P. Ryan, Modern Regression Methods, John Wiley & Sons, 1997.
- Ibid.
- R.L. Mason and J.C. Young, “Improving the Sensitivity of the T2 Statistic in Multivariate Process Control,” Journal of Quality Technology, 1999, pp.155-165.

**ROBERT L. MASON** is an
institute analyst at Southwest Research Institute in San Antonio.
He received a doctorate in statistics from Southern Methodist
University in Dallas and is a Fellow of ASQ.

**JOHN C. YOUNG** is
president of InControl Technologies and a professor of statistics
at McNeese State University in Lake Charles, LA. He received a
doctorate in statistics from Southern Methodist
University.

Featured advertisers