A Way to Generate Control

by Robert L. Mason and John C. Young

Aregression model can be a useful tool for monitoring a process. In an earlier article, we suggested using a regression model to study the linear relationship among the variables of a multivariate process.1

The discussion centered on the residuals resulting from an estimated regression model and how these residuals could be used to indicate out of control situations in a process. Let’s revisit regression models and demonstrate with some examples of how to apply them to improve the monitoring of a process.


Consider an industrial process in which the variable being examined cannot be measured directly. For example, it is very difficult to measure the process performance of a critical catalyst for certain types of chemical reactors directly. In these settings, catalyst activity can be an inferred property and not measured directly because there is no single variable that is directly indicative of the capability of the catalyst to perform its function.

However, the catalyst’s performance can be measured using a computed variable. For instance, if the function of the catalyst is to help reduce an impurity contained in the feed to the reactor, then the percentage reduction of the impurity in the feed can be used as the response variable to model catalyst performance.

The key predictor variable that has a relationship with the percent reduction variable is the temperature of the chemical reactor. As the percentage reduction in the impurity decreases, the temperature increases. However, this relationship is only moderate since a linear regression equation based on this one predictor variable would account for 25% at most of the variation in the percentage reduction variable.

Autoregressive models predict the present value of a response variable from a past value of the same variable. Since the chemical reactor in the above example is characterized as a decay process, some type of autoregressive model might improve the prediction capability of that model.

For example, consider the autoregressive model given by:

yi = βo + β1xi + β2yi-1 + εI .

In this equation yi is the present value of the percentage reduction, yi-1 is the immediate past value of the percent reduction and xi is the reactor temperature variable. Using the immediate past value of the percentage reduction and the reactor temperature as the two predictor variables in the model in the above formula will:

  • Increase the value (in percentage) of the squared correlation coefficient, R2, of the estimated regression equation to 88.8%.
  • Yield a small standard error (0.03) of prediction.

These statistics indicate the model is sufficient to use for reliable prediction. Figure 1 shows a scatter plot of the predicted value of the percentage reduction vs. the corresponding actual value of the percentage reduction.


The residuals from an estimated regression model have many benefits, including providing the means to examine the influence of one predictor variable after adjusting for the effect of other predictor variables.

Consider an industrial process in which steam is the primary power source. High energy steam is produced in a boiler and made available to the processing units. This process is a closed system and disallows the introduction of impurities to the system.

Steam is made from boiling pure water. After most of the energy from the steam is used in the processing units, the latent steam is returned to a condenser for transformation back to water, which, in turn, is moved back to the boiler and reused to make more steam. Maintaining pressure on the condenser helps move the latent steam to it for condensing.

The time sequence plot at the bottom half of Figure 2 (p. 76) illustrates the fluctuation in the absolute pressure (in standardized units) maintained on a condenser. The pressure is a critical variable in assessing the performance of the condenser. It would be worthwhile to monitor this variable and use it as an indicator of condenser performance. However, the presence of serial correlation (such as correlation with time) in the plot makes controlling this variable nearly impossible.

The time sequence plot at the top half of Figure 2 includes a plot of the river water temperature (in degrees Fahrenheit). Notice the cyclical pattern present in the temperature plot. These conditions affect the performance of the condenser. This is evident by the presence of the induced serial correlation in the plot of the absolute pressure.

Also, note the absolute pressure is low when the river water temperature is low. For higher temperatures, the pressure increases. It is difficult, if not impossible, to obtain a clear understanding of the effects of absolute pressure on the process in the presence of the river temperature condition. This makes the implementation of a control procedure to monitor this critical variable more difficult.

A successful control procedure can be obtained by adjusting the absolute pressure for the effect of the river water temperature. We can do this by creating a regression model of absolute pressure (the response variable) as a function of river water temperature (the predictor variable) and examining the residuals from the estimated regression model. The residual value represents the difference between the observed and predicted absolute pressure after adjusting for the effect of the river water temperature.

A plot of these residuals across time is presented in Figure 3 along with Shewhart upper and lower control limits. Observe the cyclical pattern due to the river water temperature has been removed, although effects from other variables still may be present. Similarly, the effects of other data characteristics, such as autocorrelation, can be removed by examining the residuals of the appropriate model.


In general, examining a single process variable is not sufficient to assess the performance of a processing unit. In most cases, a regression model based on a group of process variables is needed to describe what enters and leaves the processing unit. As discussed in the earlier article, such a model is labeled an input/output (I/O) model. The I/O model expresses the input to the system as a function of the output and other associated process variables.

Figure 4 presents the residual plot for the I/O model of the boiler system. Using this model, fuel to the boiler is expressed as a function of two predictor variables: the steam produced and the condenser performance.

The residual plot presented in Figure 4 can be expanded further. Suppose there is an increase of 5% in fuel usage to the boiler. The effect of this increase is illustrated in the right-hand side of the residual plot in Figure 5. After observation number 300, almost all of the residuals plot above the UCL. This indicates some type of upset has occurred in the fuel usage process, because we are no longer able to predict with the previous accuracy.

The shaded region of the plot in Figure 5 represents the upset time period. A positive residual for this period indicates more fuel was being used by the boiler than was used during the baseline situation. Suppose the cause of the increased fuel usage is determined, but making the necessary repairs would require shutting the unit down and operating without the electricity that is being produced. To continue operations, the plant would be required to buy electricity from external sources.

Suppose an outage is scheduled within the next three months and electricity for use during the outage has already been purchased at a reduced cost. This possibility causes the plant manager to consider whether the regression model can be used to help make the decision to:

  • Bring the unit down now, repair it and operate with purchased electricity.
  • Continue to pay the additional cost for the inefficiency in steam production and wait until the scheduled outage to take advantage of the “low” cost electricity.

The unit of measurement for the residual is the same as the unit for measurement of the fuel and has a dollar value. The shaded area under the residual plot in Figure 5 represents the additional daily cost (or total additional cost) of operating the unit with the inefficiency. This area can be determined and its dollar value established.

If the increased cost of operating with the upset was less than the added cost of buying the electricity instead of producing it, then the best decision would be to run the unit with the upset conditions. Otherwise, the best decision would be to bring the unit down and buy electricity while repairing it in order to operate the plant.

Many other goals, such as optimization, adding new process technology and cost reductions, are readily attainable through the construction of regression models and the examination of the resulting residuals.

Statistical models also have been used to dissect and analyze complex processes based on nonlinear programming procedures, neural nets and fuzzy logic. Although such models have had limited success in these situations, improvements are expected with future theoretical developments.


  1. Robert L. Mason and John C. Young, “Monitor Your Industrial Processess,” Quality Progress, Vol. 39, No. 4, pp. 89-90.

ROBERT L. MASON is an institute analyst at Southwest Research Institute in San Antonio, TX. He received a doctorate in statistics from Southern Methodist University and is a fellow of both ASQ and the American Statistical Assn.

JOHN C. YOUNG is president of InControl Technologies and a professor of statistics at McNeese State University in Lake Charles, LA. He received a doctorate in statistics from Southern Methodist University.

Average Rating


Out of 0 Ratings
Rate this article

Add Comments

View comments
Comments FAQ

Featured advertisers