Beyond Sample Size
by Christine Anderson-Cook
s a statistical consultant who works with scientists and researchers in a number of areas, the question I have answered most frequently is, “How big a sample do I need?” Without knowing much about the problem under investigation, this question is nearly impossible to answer quickly.
It may not even be the right question to ask. If the research problem involves a designed experiment, the sample size needed might be much less interesting than determining what combination of controllable inputs should be considered.
In a designed experiment, a researcher has the ability to manipulate a number of the input or explanatory variables or factors that are thought to influence the response(s) of interest. So the question of “How many?” needs to be supplemented with “and which ones?” Decisions regarding a best choice of design will involve a number of trade-offs. It is unlikely any single design will be the right choice for all sets of priorities, so deciding on the priorities for the experiment will be important in selecting the best design.
Let’s begin with some basics. Determining which of the possible inputs should be explored in the current experiment can be a difficult decision. Choose too few, and the possible interrelationships between the explanatory variables can never be studied; choose too many, and budgetary or logistical constraints may not allow adequate information to be gathered about each of the factors.
Once a set of inputs has been chosen, appropriate ranges of each of the explanatory variables and the shape of the region (cubical and spherical are common choices) need to be selected to reflect the region of interest.
You then need to examine what you are trying to accomplish with the experiment. (Two books provide excellent summaries of the many diverse aspects of a good design you may want to consider, and I used them as the basis for the following discussions.1, 2) It may be helpful to first divide the criteria for a good design into several categories roughly ordered by their importance:
1. Bare minimum characteristics that are required for the design to even be considered.
2. Criteria that measure how well a design does what is required.
3. Characteristics related to protecting the experimenter from a variety of unexpected occurrences.
In addition to these “goodness” properties, you will typically want a cost effective design that is manageably small and convenient to implement.
The first set of characteristics can be thought of as those that put a design in the right ballpark for consideration. To start choosing the appropriate designs, you must consider how the researcher expects the explanatory variables and the responses to be related—what is the anticipated model?
A model that assumes a simple relationship between the inputs and response will typically require fewer combinations of inputs to be considered. Take, for example, an experiment to study the influence of three factors (X1, X2 and X3) on a single response (Y) in a region of interest that is shaped like a cube. If you believe the relationship between inputs and response is quite simple and can be adequately summarized with a first order approximation, such as Y = ∋0 + ∋1X1 + ∋2X2 + ∋3X3 + ∋, then the 23-1 fractional factorial shown in Figure 1a with each of the factors observed at two levels might be adequate.
If some interactions between two
or more factors were anticipated
with a model of the form Y = ∋0
+ ∋i∋iXi + ∋i∋<j∋ijXiXj + ∋, then a 23
factorial design as shown in Figure 1b
might be more appropriate. However, if further complexity of the model were anticipated and the experimenter wanted to model some additional curvature in the relationship, a second order model of the form Y = ∋0 + ∋i∋iXi + ∋i∋<j∋ijXiXj + ∋i∋iiXi2 + ∋
could be considered.
To be able to estimate all the terms in this model, you might want to use a central composite3 or Box-Behnken4 design. As the nature of the relationship is expected to be increasingly complicated, more factor combinations need to be examined. To estimate all the parameters of the model, you will typically need to have more design points than parameters in the model and adequate numbers of levels of each factor to model the anticipated degree of curvature (see Figure 2).
From the simplest first order model to the more complicated second order model, some of the standard three-factor models have changed from a minimum of four design combinations to a maximum of more than 15. Clearly, one design does not serve all purposes, and you will have to make some difficult choices. You will also need to make some preliminary choices before any data are available to confirm or refute your theories of the nature of the relationship.
To further satisfy the basics, you should consider estimating variability to assess the model and choosing design points to provide adequate coverage of the region to allow prediction without having to predict using extrapolation. The first consideration usually means having some factor combinations observed more than once (replicates) or having additional design locations to provide lack of fit degrees of freedom for testing.
Finally, if there are restrictions on how the data will be collected, such as limitations on the number of runs that can be collected in a day or other natural groups of homogeneous units, then you should choose a design that incorporates these restrictions. By satisfying this checklist of basics, you should have a design that can do what you need to.
Aspects of a Good Design
You will probably still have quite a few designs to consider after the first group of characteristics has been satisfied. To help narrow the field, you should consider how well the design will allow you to perform the analysis for which it has been chosen. The analysis goals of a designed experiment should fit into one of two broad categories:
1. Obtaining good estimates of the parameters of interest in the model.
2. Being able to make predictions in the region of interest.
Various alphabetic optimality criteria attempt to quantify these two goal categories. D- and A-optimality measure how well a design can estimate the model parameters; G- and V-optimality strive to quantify the quality of prediction.5 In each case, the criteria summarize the goodness of the design with a single number, which makes comparisons between designs easy, but oversimplified.
There are several more factors of a good design. You want one that:
• Has good power to test hypotheses of interest (to correctly reject the null hypothesis when it is incorrect) and good precision for any interval estimates of parameters commonly attained by having adequate degrees of freedom to measure natural variability.
• Has the ability to protect the researcher in cases in which the experiment does not turn out exactly as planned.
• Is robust to lost observations, outliers in the observed response or missettings in the input factor levels.
• Is sufficiently complex to summarize the true relationship between factors and response with a test for lack of fit.
• Assumes homogeneous variance of the response throughout the region of interest.
The many different aspects to a good design need to be considered simultaneously, because a design that excels at one characteristic may do so at the price of another. So you may want to think of design selection as a series of strategic trade-offs that help you most closely match the goals of a particular experiment.
To help understand the trade-offs, consider a simple example that involves two explanatory variables for a square design space with the goal of fitting a first order model with the interaction Y = ∋0 + ∋1X1 + ∋2X2 + ∋12X1X2 + ∋, and assumes you have budget constraints that will allow you to consider only eight total observations.
Figure 3a and b (p. 90) illustrates two possible designs you might consider:
1. A replicated 22 factorial (3a).
2. An unreplicated 22 factorial with four center runs (3b).
Both designs can estimate the model of interest because there are at least as many design locations as parameters in the model and the points are spread out enough to allow for prediction throughout the entire region of interest without extrapolation.
Moving into the second and third categories of characteristics for good designs—those that measure the goodness of the design and protect against the unexpected—you will start to see some of the trade-offs to consider. The first design is both D- and G-optimal because it estimates the parameters of the model as well as possible and has the lowest possible maximum prediction variance for a design of this size. It allows you to assess the assumption of homogenous variance throughout the design space by obtaining four estimates of variance, one at each corner. If an outlier were observed, a second observation at the same design location would perhaps allow the outlier to be identified.
The second design, on the other hand, will estimate the model parameters a little less precisely, but it will allow for a formal lack of fitness test to see if there’s a quadratic curvature in the underlying relationship. Depending on the goal of the experiment, one design may be preferable, but, unfortunately, all the desirable characteristics of a design cannot be simultaneously optimized.
Understanding the many different aspects of a good design to consider during the selection stage of an experiment should remind you to think carefully about the goals of the experiment, what types of complications might be anticipated when running the experiment and how the results of the analysis will be used.
With the increase in the use of computer based tools to construct specialized designs for both standard and nonstandard situations, you have more choices than ever before, so you need to be judicious in your selection.
1. R.H. Myers and D.C. Montgomery, Response Surface Methodology: Process and Product Optimization Using Designed Experiments, second edition, John Wiley and Sons, 2002.
2. G.E.P. Box and N.R. Draper, Empirical Model Building and Response Surfaces, John Wiley and Sons, 1987.
3. G.E.P. Box and K.B. Wilson, “On the Experimental Attainment of Optimum Con-ditions,” Journal of the Royal Statistical Society, Series B, 13, 1951, pp. 1-45.
4. G.E.P. Box and D.W. Behnken, “Some New Three-Level Designs for the Study of Quantita-tive Variables,” Technometrics, 1960, pp. 455-475.
5. Myers, Response Surface Methodology, see reference 1.
CHRISTINE ANDERSON-COOK is a technical staff member of Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a Senior Member of ASQ.
88 I DECEMBER 2004 I www.asq.org
23-1 Fractional Factorial Design and 23 Factorial Design
to consider when selecting a design.
Possible Relationships Between Explanatory Variable and Response
QUALITYPROGRESS I DECEMBER 2004 I 89
Note: Choosing the number of levels for each factor should be based on the anticipated complexity of the relationship between the inputs and the response. If you expect a mostly linear relationship as shown by the solid line, then the two black design location dots might be adequate. However, if you expect some curvature as shown with the dashed line, then you’ll need at least one additional point (gray dot). If you expect a complicated relationship as shown by the dotted line, then you will need to obtain many levels of the explanatory variable.
90 I DECEMBER 2004 I www.asq.org
Two Competing Designs for a Two-Factor Experiment
If you would like to comment on this article, please post your remarks on the Quality Progress Discussion Board at www.asq.org, or e-mail them to firstname.lastname@example.org.