A Matter of Trust
Balance confidence in your model while avoiding pitfalls
by Christine M. Anderson-Cook
Every day, we constantly balance trusting that things will work with protecting against things that might go wrong.
When we drive around town, we trust other drivers will stop at red lights, but we also drive defensively to anticipate the occasional driver who might try to speed through the intersection on a vanishing yellow. While checking out at the grocery store cash register, we mostly trust the prices being scanned by the computer, but we may also spot check some items we anticipate the scanner misread.
Trusting things will work as intended vastly increases the efficiency of our lives. Imagine if you precisely double checked every item in the grocery store line for accuracy or hesitated at every intersection when a car approached from another direction. It would be very difficult to get anything done.
On the other hand, if you never checked or anticipated potential problems, you could miss problems and expose yourself to undue risk and consequences.
Designing a data-collection plan for an experiment should involve some of the same balance between trusting your knowledge of the system and maintaining healthy skepticism you have accurately described the features of the system. Knowing how to evaluate your choices is a function of being able to articulate your assumptions about how things work and the potential problems you want to protect against.
We must consider the importance of thoughtful model specification and include provisions for estimates of pure error and lack of fit when choosing a good design.
To illustrate some of the potential pitfalls, consider this example: Suppose you’re interested in estimating Y as a function of a single explanatory variable, X. You believe the standard simple linear regression model with a straight line
will adequately model this relationship. Figure 1 shows an optimal design for this with minimum sample size.
The label "optimal" can be somewhat deceptive because it implies you have the best design possible, but it is worth asking, "What was the optimization problem that we solved to obtain this solution?"
For designed experiments, we typically optimize conditional on the following assumptions:
- The specified model being correct.
- A particular sample size for our experiment.
- A particular design optimality criterion.
The design in Figure 1 is best for:
- The straight-line model being the correct model for the underlying relationship between Y and X.
- A saturated design, namely the number of observations, equals the number of model parameters.
- Using D-optimality as the design optimality criterion. Recall that a D-optimal design can be used to estimate the model parameters (b0 , b1 ) well.
It’s easy to see how implementing the design in Figure 1 could turn out badly. For instance, if the true relationship is not a straight line, your design offers no way to inform you that your model is inadequate. In fact, Figure 2 shows there are many possible relationships between Y and X that could lead to the same observed values at the chosen design locations.
If you add a single observation in the middle of the X range, you improve your ability to assess the adequacy of your model. Because you expect the data also will have some associated statistical error, however, this simple augmentation does not completely solve the problem. Figure 3 shows two possible models that are both reasonable for the data observed.
You could continue to believe the straight-line relationship is correct and the observed scatter of points is due to statistical error, or you could believe the statistical error is small and the true relationship is more appropriately summarized with a quadratic model.
Both are plausible, given the data you have observed. To further resolve this, you could obtain replicates (perhaps three to five) at each design location. This would allow you to estimate s2 independently from the choice of models and resolve which of the two explanations is more plausible.
Figure 4 shows how including replicates can help resolve the ambiguity of which explanation is most reasonable. If the replicated data resembles the left-hand side, then it’s likely the straight-line model is not adequate to summarize the relationship. If the data looks like the right-hand side, you might believe the straight line is consistent with what we have observed.
Thus, sometimes you can do better than optimal, which in this case means you have proposed a design that solves the problem you really have, instead of the restricted version that does not include any balancing of risks for something going wrong.
Taking it further
So, how can you generalize these results to more complicated situations?
- It is helpful to have more design locations than what are mathematically required to uniquely estimate the model parameters. In the earlier example, you have two parameters (b0 , b1 ). If you just have two design locations, then you don’t have the ability to assess whether the model is adequate. By adding an additional design location, you can obtain some information about whether your model is rich enough to capture the features of the true relationship.
- Having replicates built into your design enables you to estimate the natural variability of the data. This, in turn, helps calibrate whether the observed data are consistent with the assumed relationship through some form of lack-of-fit test.
Now, consider a slightly more complicated scenario: You would like to conduct an experiment to predict future observations of a response based on two input factors, X1 and X2. Your best knowledge (perhaps from previous studies or scientific understanding of the process) indicates a first-order model with a two-way interaction should be adequate to describe the relationship. Hence, an appropriate model would have the form:
A standard choice of "optimal" design would be a replicated 22 factorial experiment.1 Figure 5 shows the layout of this design with three replicates at each design location. Again, it is helpful to carefully specify the conditions under which this design is optimal. In this case, you wish to consider:
- A first-order model with the two-way interaction.
- A design with 12 runs.
Note here you have already recognized the need for replication by increasing your sample size.
Sometimes, you’re lucky because the optimal design for one set of conditions leads to optimal results for other objectives.
The replicated 22 factorial experiment is exceptional because it is the optimal design for any combination of the following:
- The true underlying model being a first-order model with interaction or a main effects only model.
- Any multiple of four runs. For example, for an optimal design with total sample size of 20, you would replicate the 22 factorial with five runs at each location.
- User-specified D, G or I criteria. Recall that G optimal (or I optimal) designs minimize the maximum (or average) prediction variance in the region of interest—either the circle or square shown in Figure 5.
This level of robustness to a multitude of criteria is characteristic of factorial designs when lower order models describe the process. In general, you shouldn’t expect to be so lucky. Despite its many appealing characteristics, the factorial design is not a panacea.
So, where could you get into trouble? What might you want to protect against? And how should you think about the balance between trusting and protecting?
Building in provisions
You see you have some problems with your design if you believe that a second-order model might be needed to explain the relationship between Y and (X1, X2). With four design locations and four model parameters, you have no ability to assess the adequacy of your chosen model.
Figure 6 suggests an alternative design, which augments the 22 factorial with runs at a center location. With this addition, you can evaluate whether there is more curvature in your model than what can be estimated by the first-order model with the two-factor interaction.
Because you have replicates built into the design, you can obtain a good estimate of pure error, thus enabling you to estimate s2 independently of the choice of model. Granted, the design is not optimal for the D, G or I criteria, but recall that all of those criteria are predicated on the model being correct.
In addition to being efficient when everything goes well, you also need to build in provisions that help avoid dire consequences when things go badly. You need to use different strategies to identify if the model might not be adequate. With the alternative design (in Figure 6) that allows you to evaluate the need for a second-order model, you have not sacrificed that much for the estimation of the model parameters.
Consider the relative precision of the two models: The Figure 5 design has a standard deviation for each of (b0 , b1 , b2 b12 )of 0.288s, compared to the design in Figure 6 with 0.354s. As with life, being cautious and protecting against potential risks has a cost in efficiency, but it also offers the ability to foresee and mitigate some problems.2
Choices to make
When designing an experiment, think carefully about the choice of model, the available sample size and the optimality criteria that should be used in selecting the design.
Using an optimal design can be a good starting point for creating a good data collection strategy. In many cases, however, augmenting the design with some additional runs to allow estimation of lack of fit from a more complex model can be highly beneficial.
Also, including adequate replicates to allow for good estimation of pure error can help give a more realistic assessment of how well the model fits the data with an estimate of the natural variability independent of the assumed model.
- Raymond H. Myers, Douglas C. Montgomery, Christine M. Anderson-Cook, Response Surface Methodology: Process and Product Optimization Using Designed Experiments, third edition, Wiley, 2009, pp. 286-290.
- Ibid, pp. 109-114, for more details on the calculation of this standard deviation.
In addition to p. 282 of Myers, Montgomery and Anderson-Cook’s Response Surface Methodology: Process and Product Optimization Using Designed Experiments, another helpful resource that offers other criteria to consider when building a good designed experiment is George E.P. Box and Norman R. Draper’s "A Basis for the Selection of a Response Surface Design," Journal of the American Statistical Association, 1959, Vol. 54, pp. 622-654.
Christine M. Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.