It Isn’t There
More careful, scientific planning
can motivate discovery,
by Lynne B. Hare
Macavity’s a mystery cat, he’s called the hidden paw
For he’s a master criminal who can defy the law
He’s the bafflement of Scotland Yard, the Flying Squad’s despair
For when they reach the scene of crime, Macavity’s not there! —T.S. Eliot
"Cats," Andrew Lloyd Webber’s musical—based on T.S. Eliot’s poetry with a bow toward Arthur Conan Doyle’s Sherlock Holmes stories featuring the elusive Moriarty—tells the story of the night a cluster of neighborhood cats decides which among them will ascend to the great beyond.
The elusive Macavity, who appears only briefly, has kidnapped Old Deuteronomy. As the name of the fifth book of the Torah, Deuteronomy symbolizes the comforting creeds and teachings of Moses as embraced by Judaism, Christianity and Islam. The disappearance of the cultural foundation naturally results in the cluster’s disarray, cat-alyzing (sorry) the story’s tension.
Similarly, our scientific and cultural disarray causes confusion and frustration. Sometimes the answer isn’t there. But channeled properly, confusion accompanied by careful, scientific planning can motivate discovery and well-being.
Following are three examples I thought worthy of attention.
Incredibly huge masses of data have been accumulated since computers have begun to take over much of our lives. I’ve had plant managers tell me that there is no need for special, designed studies because they have all the data they need. Every little jot and tittle has been entered into the databanks. Just work your statistical magic and tell us what adjustments we must make to improve productivity by 10%.
The plant manager isn’t wrong. There might be something worth pursuing among all that data, and a strategic approach as described by statistician Ron Snee is in order.1 He guides us through the building blocks of analytics, making a distinction between observational and experimental data, discussing quantitative and qualitative data with exploration and modeling techniques for each, stirring in process knowledge, and recommending types and amounts of additional data to be taken.
Why additional data? Verification. You see, the data in the data bank were used to help run the process. When you’re running the process, you don’t color outside the lines. You follow the rules, and you enter what you see—well, for the most part.2
Your real-time data tell you what action to take and what adjustments to make when you see a particular result; if A goes up, you adjust B to the left and so on. Real-time adjustments and other in-process changes induce correlations among measured responses comprising the databank, as Snee points out. The best statistical methods cannot overcome the resulting disarray. So yes, verification of any result found is essential to snare the illusive.
Further, suppose the keys to improving productivity by 10%, as requested, involve using process settings never used before. The answers to the questions lie outside the data bank. They are not there, and broader, more informative studies must be designed.
Have you ever had a performance review with which you have been fully satisfied? Not likely. And even if 10 positive things are said, you dwell on the one negative. Besides, how can your boss possibly know all the things—many beyond your control—that influence your ability to do the job? The scoring system doesn’t really measure your performance. How can performance be reduced to one number? And why would anyone assume that the number is normally distributed. Does it make scientific or business sense that those in the lower tail of the distribution be put on the report?3
Some years ago, I served on a committee with my company’s CEO. A conversation drifted to performance appraisal, and I summoned the courage to recommend its abolishment. He looked surprised and asked patiently why I thought that.
What I thought but didn’t say is that performance appraisal is highly multivariate and conditional. To reduce it to a single number is to deprive it of any real meaning. That’s a bit of statistical sacrilege. Instead, I asked him if he had ever been satisfied with his own. He admitted that he had not been, and a few days later (and much to the chagrin of the HR vice president), they were gone. The CEO understood that they don’t really measure performance.
A few years later, they were back. Why? The lawyers pointed out that if the organization wanted to get rid of someone, it needed to build a case. There had to be evidence. By that time, my committee work was done, so I didn’t have an opportunity to respond. But I learned from this example that performance appraisal is often a facade used to protect an organization from its employees.
Maybe it should be given another name because, as it stands, it is not and never could be a measure of performance. The information isn’t there.
DoEs and population studies
When statistics teachers come to the topic of designed experiments, they frequently focus on two-level factorial designs and their fractions. A favorite is the 23 factorial experiment, perhaps because it ventures into the third dimension and because it permits estimation of the effects of two-factor interactions.
Sometimes, students come away thinking that it is an ideal design. They randomize the eight experimental treatment combinations, take data as in Figure 1, and proceed to the analysis phase only to discover that nothing—no main effect or two-factor interaction—is significant.
This finding may be counterintuitive and inconsistent with subject matter knowledge. What went wrong? One strong possibility is the lack of an appropriate error term. There is no random component to the experiment such as might be afforded by replication.
But wait, the teacher said the data could be analyzed using normal probability plots of the effects. Isn’t that true? Yes, but with only eight experimental treatment combinations, there may not be sufficient power in the experiment to find important differences.
If the experiment had more factors (therefore more observations), some of the higher order interactions could be taken to represent random variation, and there would be more power to detect differences. As the experiment stands, the information isn’t there.
Failure to detect expected differences, however, is a much larger problem than that. You may have read recently about the inability of scientists to reproduce results of earlier large-scale sociological and pharmacological studies.
Why might this be so? Are the laws of probability no longer in force?
Herbert I. Weisberg, in his book Willful Ignorance,4 shows the webs unintentionally woven by statistical practitioners to ensnare many technologies in mediocrity. He shows that blindly following main effect studies and relying on p-values alone have left many to wonder why they are unable to reproduce earlier results.
The missing component is ambiguity, a fact that might easily be unrecognized had Weisberg not guided us back to the foundations of probability as first considered in 17th century dialogue between Pascal and Fermat and modified through the succeeding centuries by great minds such as the Bernoullis, DeMoivre, Bayes, Price and LaPlace, then into the more modern era of statistical thinking through contributions of such notables as Keynes, Fisher, Gossett, Neyman and Pearson.
The Fisher/Neyman-Pearson split—namely, building knowledge from continual iterations between synthesis and analysis, and making decisions based on the knowledge gained, as opposed to guiding decisions on the basis of p-values—is the major divide. In a nutshell, it says we have not been sufficiently diligent in our studies.
There are two key points. One is that study results apply to the mean, not to individuals. The other is that our tests are too simple. Market pressures promote quick and simple results, whereas the scientific realities are too complex for that.
For example, the efficacy of a drug, especially a new drug with complex molecules, depends on many more factors than we are inclined to build into a test. The world, in its growing complexity, can no longer allow us to assign a hypothesis to a simple main effect and relegate everything else through randomization to "willful ignorance."
In such simple tests, the needed information isn’t there.
"Cats" has a satisfying ending. Macavity comes to a literally shocking demise (electrocuted by a generator), and Deuteronomy is magically returned to the neighborhood.
It is doubtful that our disarrays will be magically resolved. More careful, scientific planning is needed. Otherwise, the answer, like Macavity, isn’t there.
- Ronald D. Snee, "A Practical Approach to Data Mining: I Have All These Data; Now What Should I Do?" Quality Engineering, Vol. 27, No. 4, 2015, pp. 477-487.
- Lynne B. Hare, "Are the Data the Data?" Quality Progress, August 2015, pp. 50-52.
- Peter R. Scholtes, The Leader’s Handbook, McGraw Hill, 1998.
- Herbert I. Weisberg, Willful Ignorance: The Mismeasure of Uncertainty, Wiley, 2014.
Lynne B. Hare is a statistical consultant. He holds a doctorate in statistics from Rutgers University in New Brunswick, NJ. He is past chairman of the ASQ Statistics Division and a fellow of ASQ and the American Statistical Association.