What’s Driving Uncertainty?
The influences of model and model parameters in data analysis
by Christine M. Anderson-Cook
One of the substantial improvements to the practice of data analysis in recent decades is the change from reporting just a point estimate for a parameter or characteristic to now including a summary of uncertainty for that estimate. Understanding the precision of the estimate for the quantity of interest provides a better understanding of what to expect and how well we are able to predict future behavior from the process.
For example, when we report a sample average as an estimate of the population mean, it is good practice to also provide a confidence interval (CI)—or credible interval if you are doing a Bayesian analysis—to accompany that summary. This helps to calibrate what ranges of values are reasonable given the variability observed in the sample and the amount of data included in producing the summary.
Estimating density example
Recently, I encountered an example that demonstrates the contributions from several sources we may wish to include in our assessment of the uncertainty. An engineer had obtained a data set with 30 observations that she wanted to use to estimate the density of a material of interest as a function of the concentration of the key ingredient. The overall goal is to identify at what concentration the density is minimized.
Subject matter expertise for the process suggested that a quadratic model of the form, Densi = β₀ + β₁Conci + β₂Conci² + εi, should be adequate to summarize the relationship between the explanatory variable, concentration and the response: density. Figure 1 shows the results when that model was fit to the available data (using least-squares estimation) and a 95% CI for the curve. The CI provides uncertainty bounds for where the estimated mean curve lies, and differs from a prediction interval which shows where we would expect new observations to be found if more data were collected from the same underlying mechanism.1
At first glance, the model seems to fit reasonably well with the overall trends in the data being appropriately captured by the estimated model.
The engineer also decided to explore a slightly more complicated model, which allows extra flexibility to consider additional curvature. Hence, in addition to fitting the quadratic model, she also fit a cubic model of the form, Densi = β₀ + β₁Conci + β₂Conci² + β₃Conci³ + εi, to see whether this provided an improved fit. The results of this fit are shown in Figure 2—with the accompanying 95% CI.
Superficially, the curve also seems to fit the data well, although the general shape does show some notable differences from the quadratic model. For larger concentrations (on the right-hand side of the plot), the rate of increase of the curve seems to diminish with the cubic model, and the shape around the minimum also seems to differ. Table 1 shows a formal comparison of the two models.
R², optimized by maximizing, summarizes the fraction of the total variability of density observed in the sample explained by each model. Adjusted R² adds a penalty for larger models and generally is a better summary than R² for comparing models of different sizes. The predicted residual error sum of squares (PRESS) statistic2 (the smaller, the better) is a form of cross-validation to assess the ability of the model to predict.
Based on the adjusted R², the cubic model is preferred. Using the PRESS statistic, the quadratic model is preferred. When we look at a formal test of the cubic term, we reject the null hypothesis that it has a value of zero (p-value ≈ 0.001) and conclude that there is strong evidence that this term should not be removed from the model.
Therefore, there is some confusion about what model is preferred, and this also is coupled with current engineering understanding of the relationship suggesting a quadratic model.
What model to choose?
Traditionally, it has been common at this point to pick the better model—here, perhaps the cubic model—and report the estimated line from this model with its associated uncertainty as the summary of the results. The CI shown in Figure2 captures the uncertainty associated with estimating the model parameters—conditional on that model being correct. Hence, this is often referred to as model parameter uncertainty. There is, however, clearly more going on here. We have gone through a process by which we considered more than one model, made a selection of which model is best and now want to report what uncertainty to associate with estimating that curve.
There is another source of uncertainty that we also should acknowledge and account for in our reporting. Did we, in fact, choose the right model? This idea is captured by model uncertainty, and reflects a potentially bigger contributor to the outcome of our study, the interpretation of results and our overall confidence in reported results. Its source lies in the process that we use for selecting the final model on which to report, and in some cases, may play a bigger role in affecting our predictions than the model parameter uncertainty.
For our engineer, the main goal was to identify at what concentration the minimum density occurs, and the expected value of the density at that location. If we just consider the cubic model, the minimum density is estimated to occur at a concentration of 9.7 with a value of 9.92. The CI at that concentration suggests a range of the mean density from 9.56 to 10.28. Christine M. Anderson-Cook, Yongtao Cao and Lu Lu provide suggestions about how to provide a summary of the uncertainty for identifying the ideal concentration value.3
If, however, we consider the possibility that the quadratic model is the true model (after all, the science suggests that this might be right one), the minimum density is estimated to occur at a concentration of 10.8 with a value of 10.05 (95% CI is [9.59,10.51]). Figure3 shows the two estimated curves overlaid with the quadratic curve with CI shown in red and the cubic model in blue. The solid colored circles on each estimated line provide the best indication of where the minimum density lies for each curve.
In this case, the values of the minimum as well as the locations of the minima both differ. If we were going to select where to set our process to optimize, ignoring the differences suggested by the two models could lead to artificially high confidence in the results.
Quantity of interest
The quantity of interest also can influence the relative contributions of model and model parameter uncertainty. If the goal was to determine the estimated density as a function of concentration for explanatory variable values between five and 25, the estimated curves and the associated CIs are relatively close—with quite a bit of overlap. Hence, model parameter uncertainty likely contributes more than the model uncertainty (the widths of the CIs at a given concentration for each model are wider than the differences between the two sets of colored lines).
Things change, however, if we are interested in the curves near the extreme end of the data set range—for example, for concentrations near zero or near 30. In these cases, the relative contributions of model and model parameter uncertainty reverse, with model uncertainty contributing more to the overall uncertainty (the difference between the two colored curves becomes larger relative to the width of the CIs at a given concentration). Of course, extrapolation beyond the range of the data with polynomials has well documented dangers,4 and the burden of having the model correct increases if the model is used to estimate outside of the observed data.
So, what is best in terms of reporting results to take into account model uncertainty as well as model parameter uncertainty?
First, it is important to acknowledge the process by which a model was chosen. Looking at several possible models and selecting one to focus on has some potential for selecting incorrectly and drawing false conclusions. Hence, if several models look reasonable based on the data and other knowledge, it can be beneficial to continue to explore the results for all of these competitive models.
In this case, we considered results from both models. If further exploration or data collection were performed to find the minimum density, continuing to evaluate concentrations between 9.5 and 11.5 is likely merited, not just close to 10.8 as the cubic model suggests.
Second, by comparing results from several possible models, we can assess the relative contributions of the two types of uncertainty. If we had just looked at Figure 1, it was not easy to see that this model might not be the best possible that we could find.
By plotting the two estimated models with their CIs in the same plot in Figure 3, we can better see the subtle differences that distinguish them.
Third, it is helpful— when possible—to report several alternatives. For some of the cases involving reliability that I have worked on, the worst-case reliability from all of the leading models is presented as an overall lower bound for possible reliability.
This can be a helpful bound when the consequences of an error in overestimating reliability are large. Alternate strategies in the statistics literature for acknowledging and incorporating model uncertainty include Bayesian model averaging5 and propagating model uncertainty.6
One more model
Another common version of model uncertainty occurs when we have multiple explanatory variables. In this case, we may have several models using different subsets of the explanatory variables that perform similarly well.
Here, the risk of choosing a single model and ignoring other contenders is potentially even greater. If we dismiss an explanatory variable from further consideration, we risk losing track of a potential mechanism that might be driving changes in our response. Christine M. Anderson-Cook, Jerome Morzinski and Kenneth D. Blecker describe a process for considering multiple models and identifying a subset of leading candidates.7
A final comment about different types of uncertainty: When we are designing experiments, it is important to build in the ability to assess the quality of the fit of our model.8 If we only design our experiment to perform well for the assumed model, and it turns that the model is incorrect and perhaps too simplistic, a poorly chosen experiment might not allow us to discover the mistake.
Choosing a well-designed experiment to allow for adequate balance between good estimating if the model is correct and protection if the model is wrong, as well as the capability for checking lack of fit, is a large topic for discussion.9, 10
Imagine if our engineer had not had the ability to explore the cubic model. This could have hidden this source of uncertainty from further investigation and led to suboptimal conclusions.
References and note
- For more details about the difference between confidence and prediction intervals, see Christine M. Anderson-Cook, "Interval Training: Answering the Right Question With the Right Interval," Quality Progress, October 2009, pp. 58-60.
- Douglas C. Montgomery, Elizabeth A. Peck and G. Geoffrey Vining, Introduction to Linear Regression Analysis, third edition, Wiley, 2001, pp. 152-154.
- Christine M. Anderson-Cook, Yongtao Cao and Lu Lu, "Maximize, Minimize or Target," Quality Progress, April 2016, pp. 52-55.
- Montgomery, Introduction to Linear Regression Analysis, see reference 2.
- Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky, "Bayesian Model Averaging: A Tutorial," Statistical Science, Vol. 14, No. 4, 1999, pp. 382-401.
- David Draper, "Assessment and Propagation of Model Uncertainty," Journal of the Royal Statistical Society B, Vol. 57, No. 1, 1995, pp. 45-97.
- Christine M. Anderson-Cook, Jerome Morzinski and Kenneth D. Blecker, "Statistical Model Selection for Better Prediction and Discovering Science Mechanism That Affect Reliability" Systems, Vol. 3, No. 3, 2015, pp. 109-132.
- Christine M. Anderson-Cook, "A Matter of Trust: Balance Confidence in Your Model While Avoiding Pitfalls," Quality Progress, March 2010, pp. 56-58.
- Lu Lu, Christine M. Anderson-Cook, Timothy J. Robinson, "Optimization of Designed Experiments Based on Multiple Criteria Utilizing a Pareto Frontier," Technometrics, Vol. 53, No. 4, 2011, pp. 353-365.
- Lu Lu, Christine M. Anderson-Cook, "Rethinking the Optimal Response Surface Design for a First-Order Model With Two-Factor Interactions, When Protecting Against Curvature," Quality Engineering, Vol. 24, No. 3, 2012, pp. 404-422.
Christine M. Anderson-Cook is a research scientist in the Statistical Sciences Group at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of ASQ and the American Statistical Association.