Right Answer, Wrong Query
Ensure experiments are designed to address the problem at hand
by Christine Anderson-Cook
Recently, I was reminded of one of my favorite statistics-related quotes from John W. Tukey: "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise."1
I had just encountered an experimenter with whom I had worked in the past, and he was excited to tell me about an experiment he had run using software to generate an optimal design. What could be better than that?
It eventually came to light that he didn’t know what criterion had been used: Was this a best design for good estimation of model parameters or good prediction of new observations? On what model was the optimization based?
That brings us to a broader issue related to solving the right problem. One of the core tenets of statistical engineering2,3 is focusing on defining the problem correctly. Choosing the right problem to solve and selecting the right path to the solution are enormously important to the success of the project.
At one time or another, I’m sure we’ve all been associated with a project for which we obtain an elegant and pleasing solution, only to find out later it did not apply to the actual situation for which it had been designed.
As statisticians, an excellent role to play is as "askers of many questions."4 This can have a powerful impact on understanding, and not taking assumptions and starting points for granted.
George Heilmeier developed a helpful list of questions to find the right focus for research efforts.5 The list of nine questions, shown in Table 1, have been adapted by the NASA Langley Statistical Engineering group as the core method for triaging and providing appropriate focus for new projects. These questions force us to identify:
- Precisely what we want to do.
- An important, well-defined question worthy of the resources that will be spent to find the answer.
- Areas where we might get into trouble.
On the surface, the questions seem straightforward and almost obvious, but giving a quality precise answer requires deep thought.
The NASA group has augmented these with questions such as "How well do you need to know?" and "What are the consequences if you are wrong?" to delve into the aspect of prediction and estimation variance, as well as the power to guide sample size issues. All these questions are important for problem selection, quality of solution and being able to communicate the importance of the project to others.
Constructing a design
Returning to my friend who had run the experiment, he had been fooled by the label of "optimal" and had jumped in too quickly before sorting out precisely what the goal of the experiment was. As I have previously noted, a good design is rarely as simple as optimizing over a single objective.6
Historically, the choice of designed experiments relied heavily on classical designs that were quite general, had good properties for a wide range of applications and often were designed to take into account several measures of a good design.7
With the advent of readily available computer power, there has been a strong movement toward optimal designs, which selects a single objective and finds the best design possible based on that. The singularity of purpose of these computer-generated designs have sometimes left experimenters vulnerable to model misspecification when slight changes occur with the scenario, or there are problems with data collection.
Most recently, there has been a movement to consider multiple objectives when constructing a design. All of this is predicated on thinking hard about what the goals of the experiment are, where things might go wrong and what represents a successful outcome. This is a key message: Don’t solve the convenient problem. Solve the actual problem.
There are many experiments for which the goals are quite similar: Estimate a particular model—with some ability to assess the assumptions of the model—and continue to provide unbiased information in the presence of some model misspecification.8
But what should experimenters do if they have a nonstandard problem?
1. Identify the objectives of the experiment. What is really important for the success of the data collection?
2. Find metrics or quantitative criteria that capture the essence of what is important. It is helpful to remember what a metric or statistic should do: "A real statistic measures something useful, not just something that is easy to measure. In this sense, it must have predictive value, which is the acid test of any statistic. If what you are really trying to measure can’t be measured by your methods, you need to improve your methods, not measure something else that is less relevant."9
The predictive value we seek in design-of-experiment problems is good performance based on our priorities of the study when the data are collected.
3. After the metrics of interest have been defined, find an experiment that satisfies these priorities. There are several strategies for translating multiple objectives into a tractable optimization problem. One established approach that works well—if you have precise knowledge about how to weight the different objectives—is a desirability function.10
Another approach that allows for imprecise knowledge of the relative importance of the different criteria and exploration of robustness uses Pareto fronts.11 This approach also allows you to examine potential different designs to understand the trade-offs between the often-competing objectives.
The key feature of these approaches is they include the flexibility to specify the problem the experimenter is really interested in solving, not just a "close-enough" approximation with a readily available solution.
The goal should be a precise answer to the right question. If both are not possible, then focusing on the right question should trump the precision of the answer.
Peter Drucker summarized the take-home message: "The most serious mistakes are not made as a result of wrong answers. The truly dangerous thing is asking the wrong questions."12
As early as 1957, A.W. Kimball challenged applied statisticians to avoid "errors of the third kind," which is defined as giving the right answer to the wrong question.13 Investing in understanding and articulating the key aspects of a study can lead to vastly improved results and important prioritization of how to best spend resources.
References and Note
- John W. Tukey, "Sunset Salvo," The American Statistician, Vol. 40, No. 1, 1986, pp. 72-76.
- Roger W. Hoerl and Ronald D. Snee, "Closing the Gap," Quality Progress, May 2010, pp. 52-53.
- Christine M. Anderson-Cook, Lu Lu, Gordon Clark, Stephanie P. DeHart, Roger W. Hoerl, Bradley Jones, R. Jock MacKay, Douglas C. Montgomery, Peter A. Parker, James Simpson, Ronald D. Snee, Stefan Steiner, Jennifer Van Mullekom, G. Geoff Vining and Alyson G. Wilson, "Statistical Engineering—Forming the Foundations" Quality Engineering, Vol. 24, No. 2, March/April 2012.
- William G. Hunter, "The Practice of Statistics: The Real World Is an Idea Whose Time Has Come," The American Statistician, Vol. 35, 1981, pp. 72-76.
- Joshua Shapiro, "George H. Heilmeier," IEEE Spectrum, Vol. 31, 1994, pp. 56-59.
- Christine M. Anderson-Cook, "A Matter of Trust," Quality Progress, March 2010, pp. 56-58.
- Raymond H. Myers, Douglas C. Montgomery and Christine M. Anderson-Cook, Response Surface Methodology, third edition, Wiley, 2009, p. 282.
- Bradley Jones and Christopher J. Nachtsheim, "A Class of Three-Level Designs for Definitive Screening in the Presence of Second-Order Effects," Journal of Quality Technology, Vol. 43, 2011, pp. 1-15. This article shows a flexible class of screening designs that solve this problem well.
- Jeffrey Ma, The House Advantage, Palgrave Macmillan, 2010, p. 116.
- George Derringer and Ronald Suich, "Simultaneous Optimization of Several Response Variables," Journal of Quality Technology, Vol. 12, 1980, pp. 214-219.
- Lu Lu, Christine M. Anderson-Cook and Timothy J. Robinson, "Optimization of Designed Experiments Based on Multiple Criteria Utilizing a Pareto Frontier," Technometrics, Vol. 53, 2011, pp. 353-365.
- Peter F. Drucker, Men, Ideas and Politics, Harvard Business Review Press, 2010.
- A.W. Kimball, "Errors of the Third Kind in Statistical Consulting," Journal of the American Statistical Association, Vol. 68, 1957, pp. 133-142.
Christine M. Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of both the American Statistical Association and ASQ.