Know What Statistical Tools You Have and How to Use Them
by Christine M. Anderson-Cook
Recently, there has been exciting discussion about statistical engineering and its potential to redefine the role of statisticians and increase our influence in business and industry through expanded participation in decision making for high-impact problems.1, 2
For those of you new to the discussion, statistical thinking3 is the strategic-level thinking that helps us appreciate that statistics is relevant for decision making in the presence of uncertainty, and statistical methods are the operational tools that help us to solve problems.
Think of statistical engineering as the tactical glue that joins these concepts to form a cohesive plan of action that includes identifying important problems in which understanding and characterizing the patterns we see are important, and then developing a sequence of steps to move from problem definition through a complete solution using a combination of relevant tools.
To make a positive impact using statistical engineering, we must be able to use our set of tools in a systematic, sequential way to define, understand, improve and maintain our products and processes. But just like with any home improvement problem, before you can start fixing things, you need to make sure you have the right tools and have them organized in your toolbox so you can find and use them at the right moment.
It should be apparent that we all have a slightly different collection of tools in our toolboxes, based on our work experiences and formal training. It is helpful, too, to have friendly, generous neighbors with bigger sets of tools who are willing to share.
But a big part of successful do-it-yourself projects is knowing what tool you need and when you need it, and being able to get a new, specialized tool when needed. It’s fine to say, "I need a thingamajig," as long as you can describe what you need it to do.
Is your statistical toolbox well organized and complete? Do you have a mental organizational structure to help you determine if you need a hammer, a screwdriver or a pair of pliers to tackle a particular problem?
When I first started work as a statistical consultant many years ago, I discovered the most difficult problems did not begin with questions from a scientist or engineer, such as, "How do I do a two-sample t-test for this set of data?" Rather, questions were more along the lines of "I’m having problems understanding why I am getting this unintuitive result."
Often, I was overwhelmed by how to get started, and it did not feel like I would be able to solve the problem at hand with a single tool. For me, the big breakthrough came when a wise colleague encouraged me to take stock of my statistics toolbox.
One summer as a graduate student, I literally did that: I started by identifying categories of problems, types of data and types of answers that might be sought, and then I proceeded to populate a multi-dimensional table with methods I had learned in various undergraduate and graduate classes.
To help make this more concrete, here are a few of the dimensions of my organizational structure. One set of categories focuses on the category of the problem to be solved:
- Data collection: design of experiments and sampling.
- Exploratory methods: checking for patterns in data, basic summaries or characteristics.
- Formal analysis: hypothesis testing, estimating characteristics or model parameters.
A separate dimension considers the type of data involved: continuous, ordinal or nominal. Yet another dimension considers if there is a natural response or responses you wish to describe as a function of one or more explanatory variables, or if all the data are on equal footing with no natural response.
Of course, for the response (y) /explanatory variable (x) case, you can have all combinations of continuous, ordinal or nominal for each category of x and y. Figure 1 shows a sample of the crossed categories.
Within each cell, there are also other subcategories to consider, including parametric or nonparametric methods, as well as graphical, numerical or both. It should be quite obvious there would be many potential labels or dimensions on which to structure your framework. I hazard a guess that if it were easy to peer into the minds of other statisticians, you would see the organization looks quite different for different people.
What is important, however, is there is a structure that feels natural and rich enough to you for the statistical tools you have.
Dealing with gaps
After I completed this exercise, I found some interesting byproducts also emerged. Not only did I have several sheets of paper that reminded me of all the things I knew, but I also had created a framework to help me organize all future additions to my toolbox.
It also helped highlight the blank cells in my array, which sometimes led me to think of a scenario in which this configuration of choices might occur and how I might tackle finding a solution. Perhaps most importantly, when I was confronted with a new problem, I had a starting point for sorting out how to begin finding a solution.
As a professor at Virginia Tech in Blacksburg, I taught a statistical consulting class several times. One of the assignments for students was to first identify categories (which they found surprisingly difficult), and then populate their own array with what they knew. After a first pass, we had a group discussion, and the students were encouraged to add more methods they knew but forgot to include.
I must confess: As much as I thought this was a good exercise for them to work through, I was quite interested to see how differently the results would turn out. A couple of things became clear to me:
- There are many sensible frameworks on which to organize our tools.
- The students often did not remember a substantial number of the tools they had, which might limit their options when trying to solve real-world problems.
- The completeness and accuracy of the assignment was extremely highly correlated with how well the students did in their studies.
There is a bit of a chicken-and-egg problem with the third point: Were the students capable because they had a good framework, or did a good framework help them do well? Regardless of which is true, or if there is some truth to both, we can all improve our problem solving and statistical engineering skills if we have a good handle on what tools are in our statistical toolbox.
- Roger W. Hoerl and Ronald D. Snee, "Moving the Statistics Profession Forward to the Next Level," The American Statistician, February 2010, pp. 10-14.
- Roger W. Hoerl and Ronald D. Snee, "Closing the Gap," Quality Progress, May 2010, pp. 52-53.
- Roger W. Hoerl and Ronald D. Snee, Statistical Thinking—Improving Business Performance, Duxbury Press, 2002.
Christine M. Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.