## 2019

STATISTICS ROUNDTABLE

# Right Answer, Wrong Query

## Ensure experiments are designed to address the problem at hand

by Christine Anderson-Cook

Recently, I was reminded of one
of my favorite statistics-related quotes from John W. Tukey:
"Far better an approximate answer to the right question, which is often vague,
than an exact answer to the wrong question, which can always be made precise."^{1}

I had just encountered an experimenter with whom I had worked in the past, and he was excited to tell me about an experiment he had run using software to generate an optimal design. What could be better than that?

It eventually came to light that he didn’t know what criterion had been used: Was this a best design for good estimation of model parameters or good prediction of new observations? On what model was the optimization based?

That brings us to a broader issue related
to solving the right problem. One of the core tenets of statistical engineering^{2,3} is focusing on defining the problem correctly.
Choosing the right problem to solve and selecting the right path to the
solution are enormously important to the success of the project.

At one time or another, I’m sure we’ve all been associated with a project for which we obtain an elegant and pleasing solution, only to find out later it did not apply to the actual situation for which it had been designed.

As statisticians, an excellent role to play
is as "askers of many questions."^{4} This can have a powerful impact
on understanding, and not taking assumptions and starting points for granted.

George Heilmeier
developed a helpful list of questions to find the right focus for research
efforts.^{5} The list of nine questions, shown in Table 1, have been
adapted by the NASA Langley Statistical Engineering group as the core method
for triaging and providing appropriate focus for new projects. These questions
force us to identify:

- Precisely what we want to do.
- An important, well-defined question worthy of the resources that will be spent to find the answer.
- Areas where we might get into trouble.

On the surface, the questions seem straightforward and almost obvious, but giving a quality precise answer requires deep thought.

The NASA group has augmented these with questions such as "How well do you need to know?" and "What are the consequences if you are wrong?" to delve into the aspect of prediction and estimation variance, as well as the power to guide sample size issues. All these questions are important for problem selection, quality of solution and being able to communicate the importance of the project to others.

### Constructing a design

Returning to my friend who had run the experiment,
he had been fooled by the label of "optimal" and had jumped in too quickly
before sorting out precisely what the goal of the experiment was. As I have
previously noted, a good design is rarely as simple as optimizing over a single
objective.^{6}

Historically, the choice of designed
experiments relied heavily on classical designs that were quite general, had
good properties for a wide range of applications and often were designed to
take into account several measures of a good design.^{7}

With the advent of readily available computer power, there has been a strong movement toward optimal designs, which selects a single objective and finds the best design possible based on that. The singularity of purpose of these computer-generated designs have sometimes left experimenters vulnerable to model misspecification when slight changes occur with the scenario, or there are problems with data collection.

Most recently, there has been a movement to consider multiple objectives when constructing a design. All of this is predicated on thinking hard about what the goals of the experiment are, where things might go wrong and what represents a successful outcome. This is a key message: Don’t solve the convenient problem. Solve the actual problem.

There are many experiments for which the
goals are quite similar: Estimate a particular model—with some ability to
assess the assumptions of the model—and continue to provide unbiased information
in the presence of some model misspecification.^{8}

But what should experimenters do if they have a nonstandard problem?

**1. Identify the objectives of the
experiment.** What is really important for the success of the data
collection?

**2. Find metrics or
quantitative criteria that capture the essence of what is important. **It is helpful to remember what a metric or
statistic should do: "A real statistic measures something useful, not just
something that is easy to measure. In this sense, it must have predictive
value, which is the acid test of any statistic. If what you are really trying
to measure can’t be measured by your methods, you need to improve your methods,
not measure something else that is less relevant."^{9}

The predictive value we seek in design-of-experiment problems is good performance based on our priorities of the study when the data are collected.

**3. After the metrics
of interest have been defined, find an experiment that satisfies these
priorities. **There are several
strategies for translating multiple objectives into a tractable optimization
problem. One established approach that works well—if you have precise
knowledge about how to weight the different objectives—is a desirability
function.^{10}

### Pareto preference

Another
approach that allows for imprecise knowledge of the relative importance of the
different criteria and exploration of robustness uses Pareto fronts.^{11}
This approach also allows you to examine potential different designs to
understand the trade-offs between the often-competing objectives.

The key feature of these approaches is they include the flexibility to specify the problem the experimenter is really interested in solving, not just a "close-enough" approximation with a readily available solution.

The goal should be a precise answer to the right question. If both are not possible, then focusing on the right question should trump the precision of the answer.

Peter Drucker
summarized the take-home message: "The most serious mistakes are not made as a
result of wrong answers. The truly dangerous thing is asking the wrong
questions."^{12}

As early as 1957, A.W. Kimball challenged
applied statisticians to avoid "errors of the third kind," which is defined as
giving the right answer to the wrong question.^{13} Investing in
understanding and articulating the key aspects of a study can lead to vastly
improved results and important prioritization of how to best spend
resources.

**References and Note**

- John
W. Tukey, "Sunset Salvo,"
*The American Statistician,*Vol. 40, No. 1, 1986, pp. 72-76. - Roger
W. Hoerl and Ronald D. Snee,
"Closing the Gap,"
*Quality Progress,*May 2010, pp. 52-53. - Christine
M. Anderson-Cook, Lu Lu, Gordon Clark, Stephanie P. DeHart, Roger W. Hoerl, Bradley
Jones, R. Jock MacKay, Douglas C. Montgomery, Peter A. Parker, James Simpson,
Ronald D. Snee, Stefan Steiner, Jennifer Van Mullekom, G. Geoff Vining and Alyson G. Wilson,
"Statistical Engineering—Forming the Foundations"
*Quality Engineering,*Vol. 24, No. 2, March/April 2012. - William
G. Hunter, "The Practice of Statistics: The Real World Is an Idea Whose Time
Has Come,"
*The American Statistician,*Vol. 35, 1981, pp. 72-76. - Joshua
Shapiro, "George H. Heilmeier,"
*IEEE Spectrum,*Vol. 31, 1994, pp. 56-59. - Christine
M. Anderson-Cook, "A Matter of Trust,"
*Quality Progress,*March 2010, pp. 56-58. - Raymond
H. Myers, Douglas C. Montgomery and Christine M. Anderson-Cook,
*Response Surface Methodology,*third edition, Wiley, 2009, p. 282. - Bradley
Jones and Christopher J. Nachtsheim, "A Class of
Three-Level Designs for Definitive Screening in the Presence of Second-Order
Effects,"
*Journal of Quality Technology,*Vol. 43, 2011, pp. 1-15. This article shows a flexible class of screening designs that solve this problem well. - Jeffrey
Ma,
*The House Advantage,*Palgrave Macmillan, 2010, p. 116. - George Derringer and Ronald Suich, "Simultaneous Optimization of Several Response
Variables,"
*Journal of Quality Technology,*Vol. 12, 1980, pp. 214-219. - Lu Lu, Christine M.
Anderson-Cook and Timothy J. Robinson, "Optimization of Designed Experiments
Based on Multiple Criteria Utilizing a Pareto Frontier,"
*Technometrics**,*Vol. 53, 2011, pp. 353-365. - Peter F. Drucker,
*Men, Ideas and Politics,*Harvard Business Review Press, 2010. - A.W. Kimball, "Errors of the Third Kind in
Statistical Consulting,"
*Journal of the American Statistical Association*, Vol. 68, 1957, pp. 133-142.

**Christine M. Anderson-Cook** is a research scientist at Los
Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in
statistics from the University of Waterloo in Ontario. Anderson-Cook is a
fellow of both the American Statistical Association and ASQ.

Featured advertisers