A Preference for Parity?
Nobody gets fired up about trying just to match the competition
by Lynne B. Hare
YOU RARELY hear coaches encourage their teams to get out there and tie. So why do banks, restaurants, consumer goods companies and others try so hard to match their competition? I’ve actually heard brand managers tell researchers not to work too hard on a particular project. “We’re just trying to achieve parity.” Some achievement!
Parity has all the fireworks of the 5th of July. Whatever happened to “Go for the throat!” and “We’re going to grab the market”? Why would anyone think it requires more work to beat the competition than to achieve parity?
While I’m on the subject, what the heck is parity, anyway? If, when time runs out, two teams have the same score, does that mean they are equal in talent and capability? I doubt it. If parity means things are identical, then it can’t exist in the tangible world because no two man-made things are identical. So perhaps parity means the difference isn’t really noticeable or important. But important to whom, I wonder. Who gets to decide?
In the consumer goods industry, a common tool used to measure strength against competition is the preference test. Researchers present consumers with brand X and brand Y and ask them which they prefer. The order is usually position balanced: half get X first, and then Y, while the other half gets the reverse order. Testing can be done blind or branded, in house or in a central location like a shopping mall. But the objective is to learn how one product stacks up against another, and the testing is meant to be fair and unbiased.
If the preference test finds a difference at some reasonably low probability level, you can comfortably declare that a difference in the consuming population exists. But a fact that too often goes dismissed or misunderstood is this: If the test doesn’t detect a difference, you don’t get to say the samples are the same. Failure to find a difference doesn’t mean it doesn’t exist; it simply means you didn’t find it. Not finding it could mean you were unlucky, or it could mean the difference is smaller than the test’s ability to detect it—also called a test’s power.
The next challenge is to build a test with the right comfort and power. By comfort, I mean a low probability of declaring a difference from competition when there is none (or very little). This is generally called a Type I error, and its probability is denoted by α (alpha). Similarly, missing an existing difference is called a Type II error, and its probability is denoted by β (beta). The power of the test is 1-β.
Companies should want to keep α low. That is, they should want to be sure the work resulted in a change that might lead to increased growth. At the same time, they would want to keep β low so they don’t miss opportunities when they come along.
You can build a test to any specification you like (or can afford since testing isn’t free). You need to know how much difference you want to detect—call that δ—and you need to know the values of α and β.
For example, let’s say you want to improve formula X. If there is no difference in preference between it and its new and improved version, Y, then the percentage of the population preferring X will be 50%. The population proportion will be P0 = 0.50. Further, let’s say you’re willing to take a 5% chance of saying you’ve improved X when you really haven’t, α = 0.05.
At the same time, you would like to make sure that if Y is better to the point that it has a proportional acceptance as high as P1 = 0.55, or 55% of the population, you have an excellent chance of detecting it—90% or β = 0.10. The difference between the assumed preference for the original formula and the new preference proportion you wish to detect is δ = 0.55 - 0.50 = 0.05, or 5%.
How many individual preferences should you obtain? There’s a formula for that:
It looks scary, but it’s not that bad. Za is the standard normal deviate corresponding to 5% of the area under the normal curve (Figure 1). Here, you put all 5% in the upper tail because you would move from formula X to Y only if you saw an increase in the preference proportion. Zβ is the area under the normal curve corresponding to 10% in the tail of the distribution, indicating a 10% chance that you might miss the improvement if it existed. The other symbols, P0, P1 and δ, are defined earlier—in this example, n = 853.
A word of caution: don’t go to strangers with this. There are consumer testing “experts” who will say you don’t need to bother with the statistical rigor of sample size determination. Some will tell you there is an industry standard of a certain sample size, or this other sample size is where “the numbers stabilize.” My favorite is, “Well, I’m not a statistician, but … ” I always want to finish the statement with “I play one on television.”
There is no industry standard, and the numbers don’t stabilize, whatever that means. There is an effective equation a layperson might not think is a pretty sight, so get some sound statistical advice if you are uncertain.
Actually, the equation is just the beginning. There are real experts in decision science who can help with the determination of appropriate alphas, betas, deltas and so on. Certainly, the selection of alpha should depend on the cost, as well as the probability of declaring a difference when none exists; likewise for beta.
As a matter of fact, it is possible to balance the testing costs against the risks and associated costs to find a sample size with maximum expected net gain for any given test. But it takes detailed study, careful calculation and an understanding that there is no such thing as parity.
Thanks to Keith Eberhardt for his always careful reviews and helpful suggestions.
LYNNE B. HARE is director of applied statistics at Kraft Foods Research in East Hanover, NJ. He received a doctorate in statistics from Rutgers University. Hare is a past chairman of ASQ's Statistics Division and is a fellow of ASQ and the American Statistical Assn.