## 2020

STATISTICS ROUNDTABLE

# How To Choose the Appropriate Design

**by
Christine Anderson-Cook **

When planning an experiment, you can consider many possible sets of observations. Choosing the combinations of factors at which to collect data to make up the best design involves balancing multiple goals and objectives.

You should begin by specifying a parametric model of interest to summarize the features of the relationship between the response and the explanatory variables.

Consider, for example, an experiment
to model the connection between two variables, X_{1} and X_{2}, and a response, Y. If
you believe the relationship is reasonably summarized by a flat plane, then the
first order model, Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + ε, may be appropriate.
If you believe the relationship will have some curvature, then the first order model
with interactions, Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{12}X_{1}X_{2} + ε,
or the second order model, Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{12}X_{1}X_{2} + β_{11}X^{21}
_{22}X^{22}+ ε, may be more appropriate.

If you select a model with too few
parameters, then it may not capture all the features of the data. If you select
a model with too many parameters, you can later perform some tests to determine
whether the model can be simplified to more accurately reflect the true characteristics
demonstrated by the observed data. However, if you select too large an initial model,
it may reduce the precision with which you are able to estimate the parameters included
in the final model.^{1}

### Balance Competing Objectives

Once an initial model has been chosen, you need
to identify a criterion (or criteria) for the selection process for a best design.
Typically, a good design will simultaneously balance a number of competing objectives.^{2} Which design is most appropriate depends on the model chosen and the specific goals
of the study. Two common primary goals of conducting an experiment and then modeling
the results are:

- To be able to estimate the parameters of the model.
- To predict the response at any new location in the design space.

Several numerical summaries for estimation
and prediction can be used to make direct comparisons between potential designs.
To understand the possible measures, look at a simple example involving an experiment
with two factors in a square design space scaled to have each factor between -1
and +1 for a first order model with interaction of the form, Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{12}X_{1}X_{2} + ε. In this situation, you should consider a
22 factorial design with two replicates at each location and one center run (design
A) and a 32 factorial design (design B). Both designs are shown in Figure 1.

### D-Optimality

D- and A-optimality are possible choices to assess the quality of estimation of the model parameters. D-optimality focuses on desirable properties for the moment matrix of a design. It is defined as

where X is a matrix that summarizes the design locations in the appropriate form for the model selected and N is the total number of observations for the design.

The moment matrix of a design is important
for assessing the quality of the design because the inverse of M, M^{-1} = N(XTX)^{-1},
contains the variances and covariances of the model parameters. D-optimality seeks
to maximize the determinant of the moment matrix, |M|, which is equivalent to minimizing
the determinant of M-1. This minimization corresponds to finding small variances
for model parameters. You should choose to scale by N because it allows you to more
fairly compare designs of different sizes on an information per observation basis.

Note the moment matrix can be calculated for any design without collecting any data from an actual experiment. This is a desirable feature for an optimality criterion because it lets you select a good design before you have to perform it.

The X-matrixes for the example in this article are

The first column corresponds to the
intercept, β_{0}, the second and third columns correspond to the main effect
parameters, β_{1} and β_{1}, respectively, and the fourth column corresponds
to the intercept term, β_{12}, which can be calculated by taking the product
of the entries from the second and third columns. Each row of the X-matrix corresponds
to one of the observations in the design. For example, the first row of X_{a} corresponds
to this point in the design space: (X_{1},X_{2}) = (-1,-1).

The matching moment matrixes for the two designs are

where the fraction, 1–9, for each design indicates both designs have nine observations.

To calculate
the relative performance based on the D-criteria, examine the determinants, |M_{a}|
= 0.702 and |M_{b}| = 0.198. Based on these results, you can conclude the first design
performs considerably better than the second design. More formally, the relative
D-efficiency of design A to B is defined as Deff = {|M_{a}|/|M_{b}|}^{1/p}, where p is the
number of parameters in the model. Therefore, Deff = {0.702/0.198}^{1/4} = 1.37 is
the same as saying design A is 37% more D-efficient than design B.

### A-Optimality

A-optimality seeks to minimize the trace of M-1
= N(X^{T}X)^{-1}. This is equivalent to minimizing the sum of the variances of all the
parameters. Although some computer packages use A-optimality as a criterion of interest
for comparing designs, it is not always the best choice because it focuses on the
variances of the parameter estimates and ignores the covariance structure between
parameters.

In the example in this article, with

and
tr(M^{-1b}) = 6.25. Based on both the criteria for parameter estimation, the replicated
2^{2} factorial design with one center run is a better design than the 3^{2} factorial
design for the first order model with interaction. This should not be surprising
because design B—with some observations at the midpoints of the X_{1} and X_{2} ranges—is a better design for estimating the full second order model, and
design A—with the first eight design points at their extreme values—provides
better estimates of the first order and interaction terms. This again indicates
the choice of a best design is a function of the chosen model.

### Scaled Predictive Variance

An alternative focus for choosing a best design
is to consider how well the designs will be able to predict new observations at
any location in the design space. A common choice for examining prediction performance
is to look at the scaled prediction variance (SPV) across the design space. For
a particular location, x^{(m)}, expressed in terms of the model, the scaled prediction
variance is SPV = Nx^{(m)T}(X^{T}X)-1_{x}(m).

Based on our example, suppose there
is interest in prediction at the location (X_{1},X_{2}) = (-0.8,0.4). Then x(m) would
be a 4 x 1 vector with elements 1, -0.8, 0.4, -0.32, where the final entry corresponds
to the interaction term, and its value was obtained by multiplying the X_{1} and X_{2} values.

One important
difference between this measure of the design and those for good estimation of the
parameters is that you obtain SPV values throughout the design space. Design A and
design B in Figure 2 show perspective plots of the SPV values across the square
design space for each of the two example designs.

Two optimality criteria are commonly used to reduce the SPV values to a single number: G-optimality and V-optimality. G-optimality seeks to minimize the worst prediction anywhere in the design space. These are appealing criteria for comparing designs because, before the experiment, you will not necessarily know where you will want to be predicting, so you will want to make sure the worst case prediction is as low as possible. For example, the largest SPV values for both designs occur in each of the four corners of the design space. The worst prediction variance is 4.375 for design A and 6.25 for design B. Based on these criteria, you would choose design A.

V-optimality (also called Q-, IV- and
I-optimality) seeks to minimize the average prediction variance across the design
space. While this is more difficult to calculate than G-optimality, looking at average
behavior across the design space may more accurately capture the important characteristic
of the distribution than just its most extreme value. For example, the average SPVs
for designs A and B are approximately 1.89 and 2.27, respectively. In this case,
the ranking of the two designs for all four optimality criteria leads to the same
conclusion: Design A is a better choice for the model, Y = β_{0} + β_{1}X_{1} + β_{2}X_{2} + β_{12}X_{1}X_{2} + ε, in the square design space X_{1} and X_{2} between
[-1,+1].

Fortunately in many situations, including those that involve completely randomized design with first order and first order with interaction linear models, the ranking of designs based on the various optimality criteria is the same. In these cases, there is no need to choose between estimation and prediction criteria. A good design can achieve both goals.

In other situations, including those that involve higher order models and split-plot experiments with restrictions on randomization for some factors, good estimation and good prediction may lead to different design choices. In these cases, it is important to carefully re-examine the primary goals of experimentation to determine which objective is more important. In some experiments, both good estimation and prediction are desirable.

If different criteria lead to different
best designs, it may make more sense to choose a compromise design that may not
be optimal for either objective but will perform quite well for both. New tools,
such as genetic algorithms,^{3} can create customized designs for nonstandard situations
that satisfy a particular optimality criterion.

When choosing an experimental design, think about which model is appropriate for the study, what region of interest is relevant for future prediction and whether estimation of parameters or future prediction is most important. These three aspects should be central to selecting a best design, so be sure to ask and answer these questions before running the experiment.

### REFERENCES AND NOTES

- For an alternative that finds a best design while balancing several competing models, read “Model-Robust Optimal Designs: A Genetic Algorithm Approach” by Alejandro Heredia-Langer, W.M. Carlyle, D.C. Montgomery and C.M. Borror (Journal of Quality Technology, Vol. 36, No. 3, pp. 263-279).
- For more details on some of the desirable characteristics we may strive to incorporate into our design selection, read Response Surface Methodology: Process and Product Optimization Using Designed Experiments, second edition, by R.H. Myers and D.C. Montgomery (John Wiley & Sons, 2002) and “Statistics Roundtable: Beyond Sample Size” by C.M. Anderson-Cook (Quality Progress, December 2004, pp. 88-90).
- Alejandro Heredia-Langer, W.M. Carlyle, D.C. Montgomery, C.M. Borror and G.C. Runger, “Genetic Algorithms for the Construction of D-Optimal Designs,” Journal of Quality Technology, Vol. 35, No. 1, pp. 28-46.

**CHRISTINE ANDERSON-COOK**is a technical staff member of Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a Senior Member of ASQ.

Featured advertisers