Improve Your Evaluations
Bayesian methods use prior knowledge in life analyses
by William Q. Meeker, Necip Doganaksoy and Gerald J. Hahn
In an earlier Statistics Roundtable column, we described how the conclusions you can draw from statistical analysis of limited life data can be bolstered by appropriately incorporating engineering knowledge and experience into the analysis. Now, let’s demonstrate how Bayesian methods can be used as an alternative in these evaluations.
The previous column1 dealt with a new design for a bearing cage used in aircraft engines for which B10 life—the time by which 10% of the units fail—was required to exceed 8,000 hours. The available data consisted of field data from 1,703 units: Only six units had failed during exposure times ranging from 50 to 2,050 hours.
A Weibull distribution was fitted to the lifetime data, using maximum likelihood (ML) methods. Mostly due to the extensive extrapolation required and the small number of failures observed, this analysis was highly uninformative, resulting in a 95% confidence interval on B10 of 2,093 to 22,144 hours.
From engineering knowledge and analysis of past data, you would expect the product hazard rate to increase with time in this example, and a Weibull distribution with shape parameter β between 1.5 and 3 to provide a good fit to the time–to-failure data. We used these insights to fit Weibull distributions assuming β = 1.5, β = 2 and β = 3, respectively. Each fit resulted in appreciably shorter confidence intervals on B10 and suggested it was highly unlikely that the 8,000-hour reliability goal would be met.
The Bayesian method
The preceding analyses provided useful insights. Management, however, wanted a more definitive analysis with a single quantitative estimate of reliability and the associated statistical uncertainty (and one that could also be readily included in a system reliability model of which the bearing cage would be just one component).
This can be achieved by assigning prior probabilities to plausible combinations of values of the unknown distribution parameters, usually via a continuous joint distribution, and using Bayes’ Theorem to combine the prior distribution and the observed data into a posterior joint distribution for the parameters. This may be thought of as a formal way of averaging the prior knowledge with the given data.
There has been a substantial increase in the use of Bayesian methods during the past 20 years by both statisticians and practitioners in a large variety of application areas, ranging from biology to economics. This has been spurred on by the rapid advances in statistical and computational science that make it possible to compute accurate approximations to Bayesian posterior distributions.
Today, most of these applications use Monte Carlo simulations to generate a sample from the desired joint posterior distribution. The results are used to compute estimates and credible interval for quantities of interest (for example, B10). Such credible intervals describe, similar to classical confidence intervals, the statistical uncertainty arising from limited data but use both the prior information and the observed data. As a result, such credible intervals are often appreciably shorter than confidence intervals based on the data alone.2
Applying the Bayesian method
Choice of a prior distribution. Historically, there has been appreciable controversy about the use of Bayesian methods because they require specification of a prior distribution, the choice of which, especially with sparse data, could have strong influence on the resulting inferences. This raises the question: "Whose prior distribution should be used?" A prudent analyst will use only prior information that can be justified from physics, engineering or other accepted knowledge, or information that has been supported by past data—and preferably all of these.
In the bearing cage example, you must specify a joint prior distribution for the two distribution parameters (the shape parameter β and the scale parameter η) of the assumed Weibull time-to-failure distribution. Based on previous data, the engineers thought the bearing cage failures were due to a well-understood fatigue mechanism, and they confirmed this by analysis of data on similar products. Therefore, they specified a relatively well-defined prior distribution for β: a lognormal distribution with 99% of its probability between 1.5 and 3.
Instead of assuming a prior distribution for η, a prior distribution was assumed for the more meaningful B10 (thereby, together with the prior distribution for β, implicitly specifying a joint prior distribution for β and h). There was, however, little prior information for B10—which, after all, was the quantity to be estimated. In fact, B10 might be as low as 1,000 hours or as high as 50,000 hours. Therefore, a "diffuse" prior distribution—a uniform distribution for the logarithm of B10 ranging from log (1,000) hours to log (50,000) hours—was employed. (Using a log-uniform prior distribution for B10, as opposed to a uniform distribution, is more conservative in that the Bayesian point estimate and upper credible bound for B10 are somewhat lower with the log-uniform distribution than the uniform distribution. Also, distributions that are limited to variables that can take on only positive values, such as the lognormal and log-uniform, are commonly used as prior distributions for quantities that are required to be positive, such as β and B10).
Results. Samples from the marginal posterior distributions for B10 and β were obtained via simulation, effectively combining the observed data with the prior distribution. The 95% credible interval for B10 was taken as the 0.025 and 0.975 quantiles from the distribution of these samples. This yielded the 95% credible interval for B10 of 2,575 to 7,004 hours. Because the upper bound of this interval is below 8,000 hours, the Bayesian method, unlike the analysis based on the observed data alone, showed the new bearing cage design was unlikely to meet its reliability goal.
Further analysis. Figure 1 is a Weibull probability plot of the bearing cage field-failure data, showing the fitted Weibull posterior distribution and the associated 95% credible interval for the fraction failing as a function of time, based on the joint posterior distribution of B10 and β. The confidence intervals based on analysis of the data alone (previously shown in Figure 2 of the November 2011 column) also are shown in Figure 1. The Bayesian method provides much better precision, as shown by the appreciably narrower dashed lines for the credible intervals versus those for the non-Bayesian confidence intervals in the region of interest (8,000 hours of life).
Outline of method
This section briefly outlines the Bayesian method used in the preceding analysis of the bearing cage data.3 This analysis involved three steps.
Step one—Generate samples from prior distribution: Monte Carlo simulation is used to generate a large random sample of pairs of values of B10 and β from their (joint) prior distribution. The values for the first 500 samples are shown in Figure 2.
Figure 2 also shows the likelihood of the observed data relative to the ML value as a function of B10 and β in a contour plot. The relative likelihood is proportional to the probability of the observed data for different values of B10 and β. Pairs of values giving large likelihoods are more plausible than pairs with small likelihoods. These contours show what the data alone say about B10 and β.
The upper right corner of the rectangle in Figure 2 marks the best (that is, ML) point estimates (3,903 hours for B10 and 2.035 for β) based on the data alone. At this point, the relative likelihood is equal to 1. Also, we can say, based on the data alone, that we are approximately 90% confident that the true values of B10 and β lie in the region enclosed by the 0.1 contour;4 similar statements apply for the other contours.
Step two—Integrate prior information and given data in a posterior distribution: Each pair of randomly generated values of B10 and β from the assumed prior distribution in step one is combined with the observed data (represented by the relative likelihood contours) to generate a sample from the joint posterior distribution of B10 and b.
For complicated statistical models with many parameters, advanced Monte Carlo Markov Chain (MCMC) methods are typically used in such Bayesian analyses to generate posterior distribution samples.5 For some problems, such as our example, there is a simple probabilistic filtering algorithm to generate a posterior distribution sample from a prior distribution sample. Using this method, every point in the sample from the prior distribution has a corresponding relative likelihood between close to 0 (for points far away from the ML estimate) and close to 1.
In the filtering algorithm, each prior distribution point passes to the posterior distribution with a probability corresponding to its relative likelihood, based on the observed data. The pairs of values of B10 and β that pass through this filter are accepted as samples from the joint posterior distribution for these two quantities.
The first 500 pairs of values after filtering are shown in Figure 3. These points are concentrated around the overlap of the likelihood contours (repeated in Figure 3) and the prior distribution sample points shown in Figure 2. The plotted posterior distribution points in Figure 3 show much less scatter than the prior distribution points in Figure 2, reflecting the reduced uncertainty concerning B10 and β resulting from the observed data.
The upper-right corner of the rectangle in Figure 3 shows the Bayes’ best estimates for B10 and β (4,115 hours and 2.097), based on combining the prior distribution and the available data—computed as the means of the generated B10 and β marginal posterior distributions, respectively.
The simulation analyses use random numbers in generating pairs of values from the prior distribution for B10 and β, and therefore are not exactly repeatable, due to Monte Carlo sampling error. This error can be made arbitrarily small by increasing the number of Monte Carlo trials. We used a sample of 10 million pairs (of which 500 pairs of values are shown in Figures 2 and 3) to achieve extremely high precision. In many practical applications, 10,000 to 20,000 samples provide reasonable precision in the estimated credible intervals.
Step three—Use generated values of posterior distribution to draw desired inferences: The generated posterior pairs of values for B10 and β obtained in step two are used to obtain Bayesian point and interval estimates. In our example, this yielded a posterior point estimate of 4,115 hours for B10, based upon averaging the B10 generated posterior values (compared with the ML point estimate of 3,903 hours).
Also, by taking the 0.025 and 0.975 quantiles of the B10 generated values, the 95% credible interval of 2,575 hours to 7,004 hours for B10 was obtained. Similar methods were used for obtaining a posterior estimate of β (not shown here) and for estimating the fraction failing at different times shown in Figure 1.
A precautionary note
Traditional methods require various assumptions—for example, a Weibull distribution for time to failure and representative samples and test environments—that demand careful examination. Bayesian methods require the further assumption of a prior distribution based on existing knowledge, adding further risk to the analysis.
Users of such methods must be wary of wishful thinking masquerading as prior information. The selected prior distribution and how well it represents existing knowledge must be carefully scrutinized. This should include consideration of reasonable alternative prior distributions and an evaluation of the sensitivity of the findings to such alternatives.
References and Notes
- William Q. Meeker, Necip Doganaksoy and Gerald J. Hahn, "Use What You Know," Quality Progress, November 2011, pp. 52-54. The authors used data from Robert B. Abernethy, J.E. Breneman, C.H. Medlin and G.L. Reinman, Weibull Analysis Handbook, Air Force Wright Aeronautical Laboratories Technical Report AFWAL-TR-83-2079, 1983, http://handle.dtic.mil/100.2/ADA143100.
- For a discussion of Bayesian methods in reliability data analysis, see Ming Li and William Q. Meeker’s Application of Bayesian Methods in Reliability Analysis, Iowa State University Department of Statistics, preprint, 2012.
- For further details on the method, see William Q. Meeker and Luis. A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, 1998.
- For further details, see William Q. Meeker and Luis. A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, 1998, section 8.3.1.
- For details on a more general method, see Andrew Gelman, John B. Carlin, Hal S. Stern and Donald B. Rubin, Bayesian Data Analysis, second edition, Chapman & Hall, 2004.
William Q. Meeker is professor of statistics and distinguished professor of liberal arts and sciences at Iowa State University in Ames, IA. He has a doctorate in administrative and engineering systems from Union College in Schenectady, NY. Meeker is a fellow of ASQ and the American Statistical Association.
Necip Doganaksoy is a principal technologist-statistician at the GE Global Research Center in Schenectady, NY. He has a doctorate in administrative and engineering systems from Union College in Schenectady. Doganaksoy is a fellow of ASQ and the American Statistical Association.
Gerald J. Hahn is a retired manager of statistics at the GE Global Research Center in Schenectady, NY. He has a doctorate in statistics and operations research from Rensselaer Polytechnic Institute in Troy, NY. Hahn is a fellow of ASQ and the American Statistical Association.