The Pros of Proactive Product Servicing
Monitoring maintenance helps avoid unexpected shutdowns
By Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker
Just as athletes can experience an injury that takes them out of a game, systems can experience component failures that require downtime and repair.
We will describe three approaches1 that can minimize the cost, inconvenience or potential danger of field failures through statistically based proactive servicing (or just-in-time maintenance).
A prime goal of proactive product servicing is to avoid unscheduled shutdowns. It is usually much less disruptive and costly to perform repairs during scheduled maintenance on automobiles, aircraft engines, locomotives and medical scanners, for example, than it is to deal with unexpected failures in the field.
The emergence of long-term service agreements—manufacturers selling not just products, but also guaranteed ongoing levels of service to customers—has further encouraged proactive servicing and increased manufacturers’ desire to build high reliability into design in the first place.
Even when a shutdown cannot be prevented, proactive servicing is still useful to ensure speedy, inexpensive repair, which can reduce the deleterious impact of such shutdowns.
The three approaches for proactive product servicing are:
- Optimum product maintenance scheduling.
- Proactive parts replacement.
- Automated monitoring for impending failures.
Optimum product maintenance scheduling
Many systems, from automobiles to aircraft engines, are serviced periodically. Examples are automobile oil changes, scheduled thermal barrier coating of turbine components and the replacement of filters in air conditioners.
Routine maintenance should be scheduled to provide an optimum trade-off between the cost and inconvenience of servicing, and the likely greater cost and inconvenience due to unscheduled failures that servicing could have averted.
Consider an automobile maintenance scheduling example; similar considerations apply for other systems.
Automobile manufacturers traditionally advised buyers to change oil and lubricate car parts every 3,000 miles. Using modern sensor technology, however, automobile maintenance scheduling can be improved by taking into account how the car is operated, considering factors such as driving speed, number of stops and number of cold starts.
Products used more harshly require more frequent servicing. This has led to the development of systems that determine the frequency of routine maintenance based on driving and cost considerations.
Dynamic maintenance scheduling, or so-called condition-based maintenance (CBM),2 extends this concept.
For automobiles, CBM might involve monitoring oil degradation and other measures of deterioration to determine the timing of the next maintenance.
The car operator is then advised, perhaps upon car start-up, of the next recommended servicing.
Proactive parts replacement
Many system field failures occur because relatively inexpensive parts wear out, causing unscheduled shutdowns with costs that are far greater than the cost of the part.
To avoid this, vulnerable parts should be replaced with new ones at strategically selected times during routine maintenance.
Sometimes, an impending failure can be detected by inspecting the part or by embedded instrumentation, which we will address later.
In other cases, you need to rely on the estimated statistical lifetime distribution of the part for such assessment.
If the hazard function for a part increases over time, the probability of failure of that part prior to the next (evenly spaced) scheduled maintenance also increases with time, and proactive replacement warrants consideration.
For example, this would be the case for a part with a Weibull distribution for its lifetime with a shape parameter exceeding one.3,4
You might want to replace such parts during scheduled maintenance if their estimated probability of failure prior to the next scheduled maintenance exceeds a specified threshold, say one in 1,000, or alternatively, if this probability is twice its initial value.
The specific plan needs to balance the cost of prematurely replacing a part against the cost of its failure in service.
The resulting statistical evaluations can again be made more powerful by considering the operating environment.
For example, in deciding when to replace a part in a locomotive, the analysis should consider the terrain (flat or mountainous) in which the locomotive will be operating.
Automated monitoring for impending failures
Concept: A further strategy for avoiding or mitigating the impact of field failures is provided by new technology: the remote, and often continuous, monitoring of products using sophisticated instrumentation. Car owners have become acquainted with this approach through lights flashing on their dashboards, which signal a possible reliability problem (for example, an impending engine malfunction), safety concern (door not closed properly), environmental issue (emissions problem) or, sometimes, a false alarm (instrument malfunction).
Forewarning of an impending system failure allows for repairs in a minimally intrusive and cost-effective manner—ideally, without users even being aware of the problem. At a minimum, it can help identify what parts need to be replaced and the technical service personnel best suited to quickly make the repair.
Here’s a locomotive engine example: Modern locomotive engines are equipped with numerous sensors that read operating parameters such as oil pressure, oil temperature and water (coolant) temperature. The resulting information is used during normal operation to make automatic system adjustments (for example, to control fuel and oil flow), using control algorithms.
In addition, such data might be used to identify and avert impending failures by shutting down the system before serious damage occurs. In light of the inconvenience and lost revenue associated with shutdowns, it is, however, imperative to minimize the number of false alarms so the engine is shut down only when absolutely necessary.
Failure modes: Consider the occurrence of an apparent large drop in engine oil pressure. This could be due to any of the three reasons shown in Table 1, together with the associated problem severity and the action that would be taken if it were known that this was, indeed, the reason for the measured oil drop. These actions range from activating a back-up sensor to shutting down the engine.
Thus, when an oil pressure drop is indicated, we need to find the reason so we can take the appropriate action. Sensor data on how oil pressure—and oil and water temperature—are changing over time can help make this determination.
Data acquisition: Each of the three measured oil pressure drop modes shown in Table 1 were induced in a lab test to observe the resulting sensor measurements. Figure 1 shows the readings on three sensors—monitoring oil pressure, oil temperature and water temperature—over a 30-second time period. The solid triangle marks the point at which the problem was induced, as evidenced by an appreciable change in one or more of the three measurements.
Analysis: Each of the three incidents resulted in a sudden, sharp drop in the reading for oil pressure. The accompanying changes in oil temperature and water temperature readings, however, differed. Thus, data from the three sensors might help distinguish among the three possible reasons for the apparent oil pressure drop:
- For the faulty sensor, the drop in the oil pressure reading was not accompanied by any discernible change in either oil or water temperature.
- For the cooling system failure, a sharp rise in water temperature preceded the drop in measured oil pressure, which was also accompanied by a sharp rise in oil temperature.
- For the oil pump failure, a sharp rise in oil temperature followed the drop in measured oil pressure. Water temperature remained unchanged.
Toward an algorithm: The data in Figure 1 were from a single engine under fixed test conditions in a lab. Similar data were obtained on different engines operating in the field under varying environments and involving various oil pressure drop incidents.
Results in example
The resulting patterns were sufficiently distinct and similar to those depicted in Figure 1 to develop a useful algorithm for the three sensor (and some further) measurements to identify the reason for a drop in the measured oil pressure. This led, in a great majority of cases, to appropriate remedial action, including the all-important decision of whether or not to shut down the system.
The analyses required the use of advanced multivariate5 and time series6 methods. Various other analytic methods are also used in automated monitoring for impending failures, including machine learning, neural nets and Bayesian belief networks.7, 8
Statistical methods play an important part in proactive product servicing. Such work, as always, is conducted in collaboration with knowledgeable engineers and subject matter experts.
References and Notes
- Gerald J. Hahn and Necip Doganaksoy, The Role of Statistics in Business and Industry, Wiley, 2008, section 9.9. This column is adapted from this book.
- Luc Adjengue, Soumaya Yacout and Ozlem Ilk, "Parameter Estimation for Condition Based Maintenance With Correlated Observations," Quality Engineering, Vol. 19, No. 3, pp. 197-206, 2007. This column provides details on constructing CBM models statistically.
- Gerald J. Hahn, Necip Doganaksoy and William Q. Meeker, "Reliability Improvement: Issues and Tools," Quality Progress, May 1999, pp. 133-139.
- William Q. Meeker and Luis A. Escobar, Statistical Methods for Reliability Data, Wiley, 1998, section 4.8.
- R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, sixth edition, Prentice Hall, 2007.
- G.E.P. Box, G.M. Jenkins and G.C. Reinsel, Time Series Analysis: Forecasting and Control, third edition, Prentice Hall, 1994.
- Trevor Hastie, Robert John Tibshirani and Jerome H. Friedman, The Elements of Statistical Learning, Springer, 2003.
- Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1998.
Necip Doganaksoy is a statistician and Six Sigma Master Black Belt at the GE Global Research Center in Schenectady, NY. He has a doctorate in administrative and engineering systems from Union College in Schenectady. Doganaksoy is a fellow of ASQ and the American Statistical Association.
Gerald J. Hahn is a retired manager of statistics at the GE Global Research Center in Schenectady, NY. He has a doctorate in statistics and operations research from Rensselaer Polytechnic Institute in Troy, NY. Hahn is a fellow of ASQ and the American Statistical Association.
William Q. Meeker is professor of statistics and distinguished professor of liberal arts and sciences at Iowa State University in Ames, IA. He has a doctorate in administrative and engineering systems from Union College in Schenectady, NY. Meeker is a fellow of ASQ and the American Statistical Association.