How To Choose the Appropriate Design

by Christine Anderson-Cook

When planning an experiment, you can consider many possible sets of observations. Choosing the combinations of factors at which to collect data to make up the best design involves balancing multiple goals and objectives.

You should begin by specifying a parametric model of interest to summarize the features of the relationship between the response and the explanatory variables.

Consider, for example, an experiment to model the connection between two variables, X1 and X2, and a response, Y. If you believe the relationship is reasonably summarized by a flat plane, then the first order model, Y = β0 + β1X1 + β2X2 + ε, may be appropriate. If you believe the relationship will have some curvature, then the first order model with interactions, Y = β0 + β1X1 + β2X2 + β12X1X2 + ε, or the second order model, Y = β0 + β1X1 + β2X2 + β12X1X2 + β11X21 + β22X22+ ε, may be more appropriate.

If you select a model with too few parameters, then it may not capture all the features of the data. If you select a model with too many parameters, you can later perform some tests to determine whether the model can be simplified to more accurately reflect the true characteristics demonstrated by the observed data. However, if you select too large an initial model, it may reduce the precision with which you are able to estimate the parameters included in the final model.1

Balance Competing Objectives

Once an initial model has been chosen, you need to identify a criterion (or criteria) for the selection process for a best design. Typically, a good design will simultaneously balance a number of competing objectives.2 Which design is most appropriate depends on the model chosen and the specific goals of the study. Two common primary goals of conducting an experiment and then modeling the results are:

  1. To be able to estimate the parameters of the model.
  2. To predict the response at any new location in the design space.

Several numerical summaries for estimation and prediction can be used to make direct comparisons between potential designs. To understand the possible measures, look at a simple example involving an experiment with two factors in a square design space scaled to have each factor between -1 and +1 for a first order model with interaction of the form, Y = β0 + β1X1 + β2X2 + β12X1X2 + ε. In this situation, you should consider a 22 factorial design with two replicates at each location and one center run (design A) and a 32 factorial design (design B). Both designs are shown in Figure 1.


D- and A-optimality are possible choices to assess the quality of estimation of the model parameters. D-optimality focuses on desirable properties for the moment matrix of a design. It is defined as

where X is a matrix that summarizes the design locations in the appropriate form for the model selected and N is the total number of observations for the design.

The moment matrix of a design is important for assessing the quality of the design because the inverse of M, M-1 = N(XTX)-1, contains the variances and covariances of the model parameters. D-optimality seeks to maximize the determinant of the moment matrix, |M|, which is equivalent to minimizing the determinant of M-1. This minimization corresponds to finding small variances for model parameters. You should choose to scale by N because it allows you to more fairly compare designs of different sizes on an information per observation basis.

Note the moment matrix can be calculated for any design without collecting any data from an actual experiment. This is a desirable feature for an optimality criterion because it lets you select a good design before you have to perform it.

The X-matrixes for the example in this article are

The first column corresponds to the intercept, β0, the second and third columns correspond to the main effect parameters, β1 and β1, respectively, and the fourth column corresponds to the intercept term, β12, which can be calculated by taking the product of the entries from the second and third columns. Each row of the X-matrix corresponds to one of the observations in the design. For example, the first row of Xa corresponds to this point in the design space: (X1,X2) = (-1,-1).

The matching moment matrixes for the two designs are

where the fraction, 1–9, for each design indicates both designs have nine observations.

To calculate the relative performance based on the D-criteria, examine the determinants, |Ma| = 0.702 and |Mb| = 0.198. Based on these results, you can conclude the first design performs considerably better than the second design. More formally, the relative D-efficiency of design A to B is defined as Deff = {|Ma|/|Mb|}1/p, where p is the number of parameters in the model. Therefore, Deff = {0.702/0.198}1/4 = 1.37 is the same as saying design A is 37% more D-efficient than design B.


A-optimality seeks to minimize the trace of M-1 = N(XTX)-1. This is equivalent to minimizing the sum of the variances of all the parameters. Although some computer packages use A-optimality as a criterion of interest for comparing designs, it is not always the best choice because it focuses on the variances of the parameter estimates and ignores the covariance structure between parameters.

In the example in this article, with

and tr(M-1b) = 6.25. Based on both the criteria for parameter estimation, the replicated 22 factorial design with one center run is a better design than the 32 factorial design for the first order model with interaction. This should not be surprising because design B—with some observations at the midpoints of the X1 and X2 ranges—is a better design for estimating the full second order model, and design A—with the first eight design points at their extreme values—provides better estimates of the first order and interaction terms. This again indicates the choice of a best design is a function of the chosen model.

Scaled Predictive Variance

An alternative focus for choosing a best design is to consider how well the designs will be able to predict new observations at any location in the design space. A common choice for examining prediction performance is to look at the scaled prediction variance (SPV) across the design space. For a particular location, x(m), expressed in terms of the model, the scaled prediction variance is SPV = Nx(m)T(XTX)-1x(m).

Based on our example, suppose there is interest in prediction at the location (X1,X2) = (-0.8,0.4). Then x(m) would be a 4 x 1 vector with elements 1, -0.8, 0.4, -0.32, where the final entry corresponds to the interaction term, and its value was obtained by multiplying the X1 and X2 values.

One important difference between this measure of the design and those for good estimation of the parameters is that you obtain SPV values throughout the design space. Design A and design B in Figure 2 show perspective plots of the SPV values across the square design space for each of the two example designs.

Two optimality criteria are commonly used to reduce the SPV values to a single number: G-optimality and V-optimality. G-optimality seeks to minimize the worst prediction anywhere in the design space. These are appealing criteria for comparing designs because, before the experiment, you will not necessarily know where you will want to be predicting, so you will want to make sure the worst case prediction is as low as possible. For example, the largest SPV values for both designs occur in each of the four corners of the design space. The worst prediction variance is 4.375 for design A and 6.25 for design B. Based on these criteria, you would choose design A.

V-optimality (also called Q-, IV- and I-optimality) seeks to minimize the average prediction variance across the design space. While this is more difficult to calculate than G-optimality, looking at average behavior across the design space may more accurately capture the important characteristic of the distribution than just its most extreme value. For example, the average SPVs for designs A and B are approximately 1.89 and 2.27, respectively. In this case, the ranking of the two designs for all four optimality criteria leads to the same conclusion: Design A is a better choice for the model, Y = β0 + β1X1 + β2X2 + β12X1X2 + ε, in the square design space X1 and X2 between [-1,+1].

Fortunately in many situations, including those that involve completely randomized design with first order and first order with interaction linear models, the ranking of designs based on the various optimality criteria is the same. In these cases, there is no need to choose between estimation and prediction criteria. A good design can achieve both goals.

In other situations, including those that involve higher order models and split-plot experiments with restrictions on randomization for some factors, good estimation and good prediction may lead to different design choices. In these cases, it is important to carefully re-examine the primary goals of experimentation to determine which objective is more important. In some experiments, both good estimation and prediction are desirable.

If different criteria lead to different best designs, it may make more sense to choose a compromise design that may not be optimal for either objective but will perform quite well for both. New tools, such as genetic algorithms,3 can create customized designs for nonstandard situations that satisfy a particular optimality criterion.

When choosing an experimental design, think about which model is appropriate for the study, what region of interest is relevant for future prediction and whether estimation of parameters or future prediction is most important. These three aspects should be central to selecting a best design, so be sure to ask and answer these questions before running the experiment.


  1. For an alternative that finds a best design while balancing several competing models, read “Model-Robust Optimal Designs: A Genetic Algorithm Approach” by Alejandro Heredia-Langer, W.M. Carlyle, D.C. Montgomery and C.M. Borror (Journal of Quality Technology, Vol. 36, No. 3, pp. 263-279).
  2. For more details on some of the desirable characteristics we may strive to incorporate into our design selection, read Response Surface Methodology: Process and Product Optimization Using Designed Experiments, second edition, by R.H. Myers and D.C. Montgomery (John Wiley & Sons, 2002) and “Statistics Roundtable: Beyond Sample Size” by C.M. Anderson-Cook (Quality Progress, December 2004, pp. 88-90).
  3. Alejandro Heredia-Langer, W.M. Carlyle, D.C. Montgomery, C.M. Borror and G.C. Runger, “Genetic Algorithms for the Construction of D-Optimal Designs,” Journal of Quality Technology, Vol. 35, No. 1, pp. 28-46.

CHRISTINE ANDERSON-COOK is a technical staff member of Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a Senior Member of ASQ.

Average Rating


Out of 0 Ratings
Rate this article

Add Comments

View comments
Comments FAQ

Featured advertisers