## 2020

STATISTICS SPOTLIGHT

# What’s Driving Uncertainty?

## The influences of model and model parameters in data analysis

by Christine M. Anderson-Cook

One of the substantial improvements to the practice of data analysis in recent decades is the change from reporting just a point estimate for a parameter or characteristic to now including a summary of uncertainty for that estimate. Understanding the precision of the estimate for the quantity of interest provides a better understanding of what to expect and how well we are able to predict future behavior from the process.

For example, when we report a sample average as an estimate of the population mean, it is good practice to also provide a confidence interval (CI)—or credible interval if you are doing a Bayesian analysis—to accompany that summary. This helps to calibrate what ranges of values are reasonable given the variability observed in the sample and the amount of data included in producing the summary.

### Estimating density example

Recently, I encountered an example that demonstrates the contributions from several sources we may wish to include in our assessment of the uncertainty. An engineer had obtained a data set with 30 observations that she wanted to use to estimate the density of a material of interest as a function of the concentration of the key ingredient. The overall goal is to identify at what concentration the density is minimized.

Subject matter
expertise for the process suggested that a quadratic model of the form, *Densi
*=* β*₀* *+* β*₁*Conci** *+* β*₂*Conci*²* *+* εi*, should be adequate to summarize the relationship
between the explanatory variable, concentration and the response: density.
Figure 1 shows the results when that model was fit to the available data (using
least-squares estimation) and a 95% CI for the curve. The CI provides
uncertainty bounds for where the estimated mean curve lies, and differs from a
prediction interval which shows where we would expect
new observations to be found if more data were collected from the same
underlying mechanism.^{1}

At first glance, the model seems to fit reasonably well with the overall trends in the data being appropriately captured by the estimated model.

The engineer also decided to explore a slightly
more complicated model, which allows extra flexibility to consider additional
curvature. Hence, in addition to fitting the quadratic model, she also fit a
cubic model of the form, *Densi *=*
β*₀* *+*
β*₁*Conci**
*+* β*₂*Conci*²*
*+* β*₃*Conci*³*
*+* εi*,
to see whether this provided an improved fit. The results of this fit are shown
in Figure 2—with the accompanying 95% CI.

Superficially, the curve also seems to fit the data well, although the general shape does show some notable differences from the quadratic model. For larger concentrations (on the right-hand side of the plot), the rate of increase of the curve seems to diminish with the cubic model, and the shape around the minimum also seems to differ. Table 1 shows a formal comparison of the two models.

*R*², optimized by maximizing, summarizes the
fraction of the total variability of density observed in the sample explained
by each model. Adjusted *R*² adds a penalty for larger models and
generally is a better summary than *R*² for comparing models of different sizes.
The predicted residual error sum of squares (PRESS) statistic^{2} (the
smaller, the better) is a form of cross-validation to assess the ability of the
model to predict.

Based on the
adjusted *R*², the cubic model is preferred. Using the
PRESS statistic, the quadratic model is preferred. When we look at a formal
test of the cubic term, we reject the null hypothesis that it has a value of
zero (p-value ≈ 0.001) and conclude
that there is strong evidence that this term should not be removed from the
model.

Therefore, there is some confusion about what model is preferred, and this also is coupled with current engineering understanding of the relationship suggesting a quadratic model.

### What model to choose?

Traditionally, it has been common at this point to pick the better model—here, perhaps the cubic model—and report the estimated line from this model with its associated uncertainty as the summary of the results. The CI shown in Figure2 captures the uncertainty associated with estimating the model parameters—conditional on that model being correct. Hence, this is often referred to as model parameter uncertainty. There is, however, clearly more going on here. We have gone through a process by which we considered more than one model, made a selection of which model is best and now want to report what uncertainty to associate with estimating that curve.

There is another source of uncertainty that we also should acknowledge and account for in our reporting. Did we, in fact, choose the right model? This idea is captured by model uncertainty, and reflects a potentially bigger contributor to the outcome of our study, the interpretation of results and our overall confidence in reported results. Its source lies in the process that we use for selecting the final model on which to report, and in some cases, may play a bigger role in affecting our predictions than the model parameter uncertainty.

For our
engineer, the main goal was to identify at what concentration the minimum
density occurs, and the expected value of the density at that location. If we
just consider the cubic model, the minimum density is estimated to occur at a
concentration of 9.7 with a value of 9.92. The CI at that concentration
suggests a range of the mean density from 9.56 to 10.28. Christine M.
Anderson-Cook, Yongtao Cao and Lu Lu
provide suggestions about how to provide a summary of the uncertainty for
identifying the ideal concentration value.^{3}

If, however, we consider the possibility that the quadratic model is the true model (after all, the science suggests that this might be right one), the minimum density is estimated to occur at a concentration of 10.8 with a value of 10.05 (95% CI is [9.59,10.51]). Figure3 shows the two estimated curves overlaid with the quadratic curve with CI shown in red and the cubic model in blue. The solid colored circles on each estimated line provide the best indication of where the minimum density lies for each curve.

In this case, the values of the minimum as well as the locations of the minima both differ. If we were going to select where to set our process to optimize, ignoring the differences suggested by the two models could lead to artificially high confidence in the results.

### Quantity of interest

The quantity of interest also can influence the relative contributions of model and model parameter uncertainty. If the goal was to determine the estimated density as a function of concentration for explanatory variable values between five and 25, the estimated curves and the associated CIs are relatively close—with quite a bit of overlap. Hence, model parameter uncertainty likely contributes more than the model uncertainty (the widths of the CIs at a given concentration for each model are wider than the differences between the two sets of colored lines).

Things change, however, if we are interested in the
curves near the extreme end of the data set range—for example, for
concentrations near zero or near 30. In these cases, the relative contributions
of model and model parameter uncertainty reverse, with model uncertainty
contributing more to the overall uncertainty (the difference between the two
colored curves becomes larger relative to the width of the CIs at a given
concentration). Of course, extrapolation beyond the range of the data with
polynomials has well documented dangers,^{4}
and the burden of having the model correct increases if the model is used to
estimate outside of the observed data.

So, what is best in terms of reporting results to take into account model uncertainty as well as model parameter uncertainty?

First, it is important to acknowledge the process by which a model was chosen. Looking at several possible models and selecting one to focus on has some potential for selecting incorrectly and drawing false conclusions. Hence, if several models look reasonable based on the data and other knowledge, it can be beneficial to continue to explore the results for all of these competitive models.

In this case, we considered results from both models. If further exploration or data collection were performed to find the minimum density, continuing to evaluate concentrations between 9.5 and 11.5 is likely merited, not just close to 10.8 as the cubic model suggests.

Second, by comparing results from several possible models, we can assess the relative contributions of the two types of uncertainty. If we had just looked at Figure 1, it was not easy to see that this model might not be the best possible that we could find.

By plotting the two estimated models with their CIs in the same plot in Figure 3, we can better see the subtle differences that distinguish them.

Third, it is helpful— when possible—to report several alternatives. For some of the cases involving reliability that I have worked on, the worst-case reliability from all of the leading models is presented as an overall lower bound for possible reliability.

This can be a helpful bound when the consequences
of an error in overestimating reliability are large. Alternate strategies in
the statistics literature for acknowledging and incorporating model uncertainty
include Bayesian model averaging^{5} and propagating model uncertainty.^{6}

### One more model

Another common version of model uncertainty occurs when we have multiple explanatory variables. In this case, we may have several models using different subsets of the explanatory variables that perform similarly well.

Here, the risk of choosing a single model and
ignoring other contenders is potentially even greater. If we dismiss an
explanatory variable from further consideration, we risk losing track of a
potential mechanism that might be driving changes in our response. Christine M.
Anderson-Cook, Jerome Morzinski and Kenneth D. Blecker describe a process for considering multiple models
and identifying a subset of leading candidates.^{7}

A final comment about different types of
uncertainty: When we are designing experiments, it is important to build in the
ability to assess the quality of the fit of our model.^{8} If we only
design our experiment to perform well for the assumed model, and it turns that
the model is incorrect and perhaps too simplistic, a poorly chosen experiment
might not allow us to discover the mistake.

Choosing a well-designed experiment to allow for
adequate balance between good estimating if the model is correct and protection
if the model is wrong, as well as the capability for checking lack of fit, is a
large topic for discussion.^{9, 10}

Imagine if our engineer had not had the ability to explore the cubic model. This could have hidden this source of uncertainty from further investigation and led to suboptimal conclusions.

### References and note

- For more details about the difference between
confidence and prediction intervals, see Christine M. Anderson-Cook, "Interval
Training: Answering the Right Question With the Right Interval,"
*Quality Progress*, October 2009, pp. 58-60. - Douglas C. Montgomery, Elizabeth A. Peck and G.
Geoffrey Vining,
*Introduction to Linear Regression Analysis*, third edition, Wiley, 2001, pp. 152-154. - Christine
M. Anderson-Cook, Yongtao Cao and Lu Lu,
"Maximize, Minimize or Target,"
*Quality Progress*, April 2016, pp. 52-55. - Montgomery, Introduction to Linear Regression Analysis, see reference 2.
- Jennifer A. Hoeting,
David Madigan, Adrian E. Raftery and Chris T. Volinsky, "Bayesian Model Averaging: A Tutorial,"
*Statistical Science*, Vol. 14, No. 4, 1999, pp. 382-401. - David Draper, "
*Assessment and Propagation of Model Uncertainty*," Journal of the Royal Statistical Society B, Vol. 57, No. 1, 1995, pp. 45-97. - Christine M. Anderson-Cook, Jerome Morzinski and Kenneth D. Blecker,
"Statistical Model Selection for Better Prediction and Discovering Science
Mechanism That Affect Reliability"
*Systems*, Vol. 3, No. 3, 2015, pp. 109-132. - Christine M. Anderson-Cook, "
*A Matter of Trust: Balance Confidence in Your Model While Avoiding Pitfalls*," Quality Progress, March 2010, pp. 56-58. - Lu Lu, Christine M.
Anderson-Cook, Timothy J. Robinson, "Optimization of Designed Experiments Based
on Multiple Criteria Utilizing a Pareto Frontier,"
*Technometrics*, Vol. 53, No. 4, 2011, pp. 353-365. - Lu Lu, Christine M.
Anderson-Cook, "Rethinking the Optimal Response Surface Design for a
First-Order Model With Two-Factor Interactions, When Protecting Against Curvature,"
*Quality Engineering*, Vol. 24, No. 3, 2012, pp. 404-422.

**Christine M.
Anderson-Cook** is a research scientist in the Statistical Sciences Group
at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in
statistics from the University of Waterloo in Ontario. Anderson-Cook is a
fellow of ASQ and the American Statistical Association.

Featured advertisers