3.4 PER MILLION
An Everyman’s Guide to Handling Data
Statistical practitioners should be team players, not shamans
by Joseph D. Conklin
I have been an applied statistician for many years. Regardless of the industry or project I encounter, the most common request I get is, "How big a sample size do I need?" That one is usually fairly straightforward, but not always. The second most common request I get is, "Analyze this data, and tell me what it means." That one excites and scares me.
The exciting part is the chance to help a coworker solve a problem. The scary part is the unstated assumption sometimes lurking in the background: The meaning lies like gold nuggets waiting for only the right statistical dowsing rod to reveal it. I call this the "statistician as shaman" theory. This road should be avoided at all costs.
Problems occur because there’s a failure to understand that data are not self-interpreting. They depend on some prior context. Statistical practitioners should ceaselessly prompt the customer about why the data exist and how they are generated before beginning analysis. In this way, the statistician becomes an extension of the customer’s team. I call this the "statistician as a team player" theory. On that road lies many more rewards.
Making sense of marketing
How this plays out varies case by case. To illustrate the concept more clearly, follow the story of Mark from marketing.
Mark brought a 20-year record of the company’s success rate with marketing plans (shown in Table 1) to my office. I asked him, "Why does this data exist?" He said, "A while back, senior management started requiring departments to justify their activities with data. The marketing department measures the performance of its marketing plans to justify the budget for continuing successful plans."
"Who owns this data, who manages it, and what does ‘successful’ mean?" I asked.
"The vice president of marketing is the official owner, but it’s his direct reports who collect, check and store it," Mark explained. "Success is a financial measure. We know what the sales are before the plan starts. Depending on the product, we measure sales a certain number of months after the plan ends. If the sales at the end exceed the value at the beginning by at least the cost of the plan plus a little extra, the plan is considered successful."
"How do you know the numbers are right?" I responded. "I can imagine the possibility of some subtle and not-so-subtle pressure to pay less attention or care to the numbers for the unsuccessful plans."
Mark replied, "For the first few years, senior management tended to accept the marketing department’s numbers without question. After we merged with another company in the early 1990s, the success rate took a noticeable dip.
"The first instinct was to blame the merger, but the new senior executives were puzzled by that theory because most of the mid-level people were still in place, and there had not been that much time to alter how the plans were devised. Accounting started taking a closer look at our numbers before we sent the results.
"This happened before I started here, so a lot of the story comes from second or third-hand accounts. A few people investigated the matter, but efforts were not well coordinated or documented. Based on the scant material left in our files, it looks like the responsibility for computing the success rate was transferred to somebody new shortly before the merger.
"This person read the company policies for cost accounting and decided on his own that the fixed costs for the marketing plans should be prorated according to the sales volume and included in computing the success rate. This had the effect of increasing the total costs and, hence, the level of sales needed to consider any given plan a success."
"Is that the present policy?" I asked.
"We can’t find any official memo making it a required practice. It has continued as an unquestioned, handed-down tradition," Mark replied.
"Because accounting isn’t my specialty, I can’t weigh in one way or the other," I said. "It sounds like this practice is a good candidate for review by the right people. It provides evidence that at least some of the movement in this variable is not directly tied to the market forces influencing demands for the products. Does accounting continue to check your cost figures?"
"Yes, at least for the plans for the highest-profile products or with the highest total costs. It’s not perfect, but I think it does a good job of keeping my department honest," Mark said.
"They write up the results of their checks in case questions come up down the road."
"Those reports might be useful for analyzing the success rate," I said. "But how do you measure success for new products where there are no existing levels of sales?"
"There are a few basic options," Mark replied. "We can substitute some target level of sales the marketing department believes the company should be able to achieve. If our product is new but goes against a competitor’s existing product, we may substitute our best estimate of the current sales of the competing product. Neither choice affects how success is evaluated."
"One simple analysis we can do right away is plot the overall success rate over time," I said, producing the run chart shown in Figure 1. "I can see that downturn in the success rate we just talked about. During roughly the past 10 years, it looks like the rate has gradually improved."
I redrew the run chart to show only the past 11 years as shown in Figure 2. Mark traced the gradual rise from approximately 75% to 90% and grew excited. He wanted to check out different theories to explain it.
"First," I said, "let’s rule out the possibility of statistical noise. If that is really what’s going on, we may be chasing a ghost. The trend, if real, appears to follow a basic linear pattern."
I fitted a linear regression using the data-analysis toolkit in Microsoft Excel. The independent variable was the year, and the dependent variable was the success rate. The results appear in Table 2. There is strong evidence of a real linear trend.
A byproduct of testing for the linear trend is the predicted values for the success rate. The differences between the actual and predicted rates—the residuals—offer grist for additional checks on what may be driving the data over time.
These are in the form of the moving range and individual control charts in Figures 3 and 4.1 They do not suggest anything more complicated than that a linear trend is necessary to account for the change in the success rate over time.
"Now we can propose and think about testing different theories to account for the rising trend over time," I said. "Have there been any changes to the internal policies and procedures for computing the success rate since that incident following the merger?"
"I don’t think so, but I will verify that before doing anything else," Mark promised.
If the success rate is still computed the same way, there are various theories that could explain a rising trend in the success rate over time:
- The company’s ability to design marketing plans has improved.
- The company’s ability hasn’t changed, but its particular products are more popular.
- The company’s products are just as popular, but the competition has become weaker.
- Steady economic growth has raised the demand for all like products in the market.
Mark and I adjourned our first meeting so he could do his promised research and come back with his own theories, which we could figure out how to test later. He carried with him some questions that would help anyone trying to make sense of data:
- Why are the data important?
- Who produces the data?
- What secondary purposes or audiences does the data serve?
- How are the data produced?
- How has the production method changed over time?
- How are the data preserved? For how long and under what conditions?
- How well are data creation and storage documented?
- Are there any simple graphical approaches that can be applied first? Which ones?
- What underlying trends do the graphical approaches suggest? Can these be verified?
- Once trends are verified, what are the major theories that could account for them?
Reference and Note
- Moving range and individuals control charts are treated in Douglas C. Montgomery’s Introduction to Statistical Quality Control, sixth edition, John Wiley, 1996. The random-looking trace on both charts is evidence the linear trend is sufficient to explain the behavior of the success rate over time.
Joseph D. Conklin is a mathematical statistician at the U.S. Department of Energy in Washington, D.C. He earned a master’s degree in statistics from Virginia Tech and is a senior member of ASQ. Conklin is also an ASQ-certified quality manager, quality engineer, quality auditor and reliability engineer.