Forecasting the number of future field failures
by William Q. Meeker, Necip Doganaksoy and Gerald J. Hahn
Manufacturers must frequently predict the number of future field failures for a product using past field-failure data, especially when an unanticipated failure mode is discovered in the field. Such predictions are needed to quantify future warranty costs and ensure a sufficient number of spare parts will be available to quickly repair failed units.
In extreme cases, failure predictions are also needed to decide whether a recall is warranted and, if so, which segments of the product population must be recalled—such as the units built during a specified period of time or those produced in a particular plant. Using an example of a fictitious company dealing with a failed part, we will describe statistical methods for making these predictions.
A home appliance part called component D was failing in the field. The company’s engineering department immediately investigated and found the cause of the problem to be related to a cost-saving process change made by the part’s manufacturer one year earlier.
The problem was corrected immediately for all future product, and the reliability of the corrected part was verified to be satisfactory by accelerated testing. Unfortunately, such testing had not been conducted at the time of the cost-saving measure.
One major problem remained: the nearly 300,000 appliances built during the past year and shipped with the faulty part. These were of particular concern because an eyeball examination of the limited failure data to date suggested the hazard rate was increasing over usage time—one reason it had taken a year to discover and focus on the problem.
Thus, although there had been relatively few failures—fewer than 200 to date—it was possible this problem could mushroom into a much larger one during the product’s three-year warranty period and in subsequent years.
In particular, the manufacturer wanted a prediction of how many of the nearly 300,000 units would fail during their first three years of life and the rate at which units would arrive for warranty repair. The manufacturer also wanted an idea of the failure behavior after the warranty period had ended.
Such predictions would lead to an assessment of the severity of the problem and whether it merited a product recall. The predictions also would allow the manufacturer to develop a plan to minimize the inconvenience experienced by customers, including a determination of how many replacement parts would be needed and by when. These assessments were to be based on a statistical analysis of the data on the failures that had already occurred.
Records on the total number of appliances manufactured each month were available from the company’s production department. Moreover, customers generally reported failures almost immediately after they occurred. Warranty repairs were implemented shortly thereafter and reported instantaneously via a wireless bar-coding system used by the repair person.
Unfortunately, the raw data were not accessible, and the only failure information available was a summary tabulation of the number of units in each month’s production that had failed to date, shown in Table 1.
Of the 24,057 units that have been in service for seven months, 11 had failed to date. We do not know when these failures took place during the seven months; therefore, in the analysis of the data, the failure times of these 11 failed units are “left censored” at seven months (that is, known only to fail sometime during their first seven months of service).
The failure times of the 24,046 unfailed units are, moreover, “right censored” at seven months (that is, the failure time is known only to be greater than seven months). Much field-failure data are, like this data, “multiply” right censored, meaning the units have differing censoring times due to the staggered entry over time of product into the field.1
Statistical analysis of life data
Such product life data generally cannot be analyzed by the methods taught in introductory statistics courses. One reason for this is the presence of censored data.
Another reason is that time-to-failure data do not typically follow a normal distribution. Instead, the Weibull and lognormal distributions have been found, based on theoretical and empirical grounds, to often (but not always) do a good job in representing such data. These distributions are characterized by their parameters—just as the normal distribution is described by its mean and standard deviation.
For example, a specific Weibull distribution is identified by its scale parameter and its shape parameter β. These parameters, in turn, can be estimated using methods such as maximum likelihood (ML) from the available data (including censored data) and using modern computer software, such as JMP 2010,2 Minitab 20103 and Splida/RSplida (used in our analyses).4
A Weibull and a lognormal distribution were fitted to the component D field-failure data shown in Table 1. Each fitted distribution was then used to estimate the fraction of product failing after various years in service.
Figure 1 shows a lognormal probability plots for the component D field-failure data, showing the lognormal and the Weibull ML estimates of fraction product failing as a function of months in service, as well as the observed failures.
In Figure 1, note that the plotted points are scattered around the fitted lines for each of the two distributions. This suggests the Weibull and lognormal distributions do a good job representing the data, at least during the first 12 months of service.
In extrapolating beyond about 15 months, the Weibull and lognormal estimates tend to diverge, with the predictions using the Weibull distribution being more pessimistic than those based on the lognormal distribution. Because it is more conservative than the lognormal, the fitted Weibull distribution was used for predictions.
The ML estimates for the scale and shape parameters of the Weibull distribution, fitted to the data, were = 952.92 and = 1.483, respectively. The fact that the estimated shape parameter was greater than 1 suggests, as expected, an increasing hazard rate over time and provides evidence of product wear out. Thus, a unit with 12 months of service has a higher probability of failing in the next month than a unit with only one month of service.
Most importantly, based upon the Weibull distribution fit, the software estimated the fraction failing after 36 months to be 0.008, with an upper 95% statistical confidence bound (also calculated by the software) of 0.014.
Management was also concerned with customer satisfaction and the probability of failure beyond the warranty period, so estimates of the five-year and eight-year fraction failing were also requested. Again, using the more pessimistic Weibull distribution, these were 0.016 and 0.033, respectively, with associated upper 95% statistical confidence bounds of 0.037 and 0.087, respectively.
Based on these results, management decided, with much relief, that a product recall was not warranted. Instead, provisions were made to repair—as expeditiously and seamlessly as possible—those units in the field that would fail during the warranty period. For this purpose, it was important to predict the number of failures in each of the next 36 months—that is, the end of the warranty period for the youngest group of units in the field.
Predictions of future failures
Table 2 and Figure 2 show the predicted number of in-warranty failures in each of the next 36 months based on the fitted Weibull distribution. For example, a total of 51 (actually 50.58) units are predicted to fail during the fifth month from now. The sidebar, “Calculation of Expected Number of Failures in Each Future Month,” details how this number was calculated.
Table 2 and Figure 2 show that the predicted number of in-warranty failures increase each month for the next 24 months because of the wear-out nature of the failure mode. The predicted number of such failures, however, decreases rapidly starting in month 25 as older units begin to drop out of warranty coverage.
Figure 3 shows predictions for the cumulative number of future failures for the next 36 months. The total predicted number of in-warranty failures is 2,032.
Word of warning
The prediction of the number of in-warranty failures during the next 36 months involves an extrapolation of the fitted Weibull distribution beyond the 12 months of available data. This assumes the model continues to hold in the extrapolated region. Figure 1 suggests such an assumption might be highly questionable. Moreover, the uncertainty associated with this assumption is not included in the statistical confidence limits cited; these pertain to statistical sampling uncertainty only.
In light of this assumption, the analyses are to be redone every three months over the next two years to include newly acquired failure data. The results are to be used to update the estimates.
Other situations, more reading
The example involving component D is only one illustration of many practical situations in which you might want to predict the number of future failures. The details will vary case by case, especially in regard to the nature of the available field-failure data.
For example, unlike the case of component D, you frequently know the age of parts at their time of failure. In that case, the data would not involve left censoring of the failure times but would still include right-censored data (on the unfailed units), similar to the data described in a previous Statistics Roundtable column.5 Fortunately, the available software is sufficiently versatile to accommodate different situations.
Authors William Q. Meeker and Luis A. Escobar provided more technical details on future failure prediction in chapter 12 of Statistical Methods for Reliability Data. They also describe simulation-based methods for computing prediction intervals that quantify the statistical uncertainty—but not the model uncertainty—in the predictions.6
In a Technometrics article, authors Yili Hong and William Q. Meeker described a more general model for warranty failure prediction for a situation that involves multiple failure modes in which the expected number of failures for each mode needs to be predicted. Their article also showed how to factor use-rate information—information on the frequency of product use by customers—into the analysis.7
References and Notes
- In Applied Life Data Analysis (John Wiley & Sons, 1982), Wayne Nelson gives another example of data with this kind of structure.
- Information about JMP statistical software is available at http: www.jmp.com.
- Information about Minitab 16 statistical software is available at http://minitab.com.
- William Q. Meeker, Splida/RSplida reliability analysis software. Information available at www.public.iastate.edu/~splida.
- Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker, “Product Life Data Analysis: A Case Study,” Quality Progress, June 2000, pp. 115-122.
- William Q. Meeker and Luis A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, 1998, chapter 12.
- Yili Hong and William Q. Meeker, “Field-Failure and Warranty Prediction Based on Auxiliary Use-Rate Information,” Technometrics, Vol. 52, 2010, pp. 148-159.
William Q. Meeker is professor of statistics and distinguished professor of liberal arts and sciences at Iowa State University in Ames. He has a doctorate in administrative and engineering systems from Union College in Schenectady, NY. Meeker is a fellow of ASQ and the American Statistical Association.
Necip Doganaksoy is a principal technologist-statistician at the GE Global Research Center in Schenectady, NY. He has a doctorate in administrative and engineering systems from Union College in Schenectady. Doganaksoy is a fellow of ASQ and the American Statistical Association.
Gerald J. Hahn is a retired manager of statistics at the GE Global Research Center in Schenectady. He has a doctorate in statistics and operations research from Rensselaer Polytechnic Institute in Troy, NY. Hahn is a fellow of ASQ and the American Statistical Association.
Calculation of Expected Number of Failures in Each Future Month
Online Table 1 shows the calculation of the expected number of in-warranty failures in each future month—shown in Table 2—by focusing on the estimate of 50.58 for the expected total number of in-warranty failures during the fifth month from now. The other numbers in Table 2 are calculated similarly.
As shown in Online Table 1, the prediction of 50.58 expected failures during the fifth month from now is obtained by:
- Estimating (per subsequent discussion) the expected number of failures during the fifth month from now among the currently unfailed (and at risk) units that have been in service for one month
- Estimating the expected number of failures during the fifth month from now among the currently unfailed units that have been in service for two months
- Making similar estimates for each of the other ten months in-service groups
- Adding these 12 predictions to obtain a total of 50.58 estimated expected failures.
In making such calculations for times beyond the next 24 months, the in-service months that exceed a total exposure over 36 months (adding the months to date to the number of future months) are excluded.
Thus, for example, in calculating the expected number of in-warranty failures in the 32nd month from now, we are only concerned with units that came into service during the most recent four months, omitting those that came into service during the first eight months.
Moreover, the estimated expected number of failures during the fifth month from now for units that have been in service for a specified number of months is the product of the number of units at risk and the estimated conditional failure probability (during the fifth month from now) for units with that number of months in service.
For example, the expected number of failures during the fifth month from now of 3.55 for units that have been in service for two months is the product of the 25,386 units at risk (25,389 installed – 3 failures to date) and the conditional probability of failure during the month (0.0001397).
Finally, the conditional probability (0.0001397) that a unit that has been in service two months without failure will fail some time during the fifth month from now (or during its seventh month of life) is calculated as the:
Probability(Unit will survive for 6 months) – Probability(Unit will survive for 7 months)
Probability(Unit will survive for 2 months)
Substituting in the corresponding expressions for the Weibull distribution1 with = 952.92 and = 1.483, the estimated conditional probability of a unit that has been in service for two months failing during its seventh month in service (or during the fifth month from now) is:
At first glance, the preceding sounds pretty complicated. It can, however, be organized conveniently in a spreadsheet in which the columns correspond to future months and the rows represent different months-in-service groups. The entries in the spreadsheet would be the expected number of failures in a specific future month from a specific months-in-service group.
Summing over rows
of the spreadsheet would then provide predictions for the total number of
failures in each future month, as shown in Table 2. The calculations and
graphics in our example were done using a simple function written in R.2 There are also commercially available software packages that have similar
capabilities, such as Minitab 2010 and Weibull++.3-4
—W.Q.M., N.D. and G.J.H.
- Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker, “Product Life Data Analysis: A Case Study,” Quality Progress, June 2000, pp. 115-122.
- R Development Core Team, “R: A Language and Environment for Statistical Computing,”www.R-project.org.
- More information about Minitab 16 statistical software is available at http://minitab.com/.
- More information about Weibull++ reliability analysis software is available at http://www.reliasoft.com/Weibull/.