The Sin of Spin
Don’t accept ambiguity; insist on ‘absolute’ information
by Christine M. Anderson-Cook
We live in an age in which media and marketing often spin data and end up misleading and misinforming the consumer. Consider these recent headlines that left some obvious questions unanswered:
- "Foreclosure auctions drop more than 30%." From when to when? What is the rate now? Is this usually a volatile rate that naturally fluctuates quite a bit?
- "South Carolina’s unemployment rate shot from 5.5% in February 2008 to 12.5% last January." The accompanying article described how the rate fluctuated in 2010 and was 10.7% in November. Why was this window of time selected?
A lot of advertisers and news outlets seek to sensationalize their messages to catch our attention. Often, the way information is communicated in the workplace seems to have borrowed from media and marketing’s approach.
Recently, I have been resensitized to these misleading practices of data presentation by reading books by Gerd Gigerenzer1 and Donald J. Wheeler.2 Both authors highlight the importance of presenting data in a non-deceptive format, which refrains from prejudicing the audience, giving sufficient information for the assessment or decision to be made independently.
Their ideas build on the work of Edward Tufte.3–6 Key messages are:
- Give the raw data in its natural form (absolute summaries), which is strongly preferred to relative comparisons from one observation to another.
- Provide sufficient historical data to allow realistic evaluation of the recent changes, taking into account natural fluctuation and previous trends.
- Include a quantification of measurement uncertainty with the point estimate if there is uncertainty with an estimated quantity.
Consider the following three examples of deceptive or ambiguous statements that illustrate how we might adapt from soundbites and headlines to refocus data presentation to be maximally informative and minimally deceptive:
‘Production is up 10%’
This sounds catchy and impressive, but should this result get you excited? Gigerenzer highlights how the human mind is naturally predisposed to filling in missing information to give the statement context to make it understandable. Without additional details, you are unable to know if this is an important fact.
What other key information should be provided for you to make an enlightened assessment of this statement?
First, you need to know the comparative time period you are relating this interval to: Are you looking at this month’s production compared with last month? Compared with this month last year? Compared with average production in this month for the last 10 years?
Second, when the comparison is based on a single previous time period, it is helpful to acknowledge the natural variation between observations. Figure 1 shows four different plots with a 10% change in production from the last month to this month.
In all cases except the first situation (A), we are unlikely to think this change is indicative of real change in production. If last month’s observation represented a 13% drop in production from the previous month (B), then you are likely to be less impressed with this month’s increase.
Similarly, if there is a seasonal trend (C), the increase in production might coincide with the regular annual pattern, and you would likely be better informed by looking at production compared to the average for this month in other years. Finally, (D) shows a high variability process in which fluctuations of 10% are not unexpected, and you should likely react only when the change falls outside the range of the natural variation of the process.
Your interpretation of the 10% change is very much a function of understanding the pattern of change in recent times. To assess the importance of this change, it would be ideal for the comparison to be made relative to similar months of data (for instance, the average of months with similar seasonal patterns for several years) and with the associated uncertainty of production appropriately characterized.
Also, a simple time series plot with enumerated scale shown on the y-axis—and with sufficient history to capture seasonality—is an effective summary to provide a compact and suitable context for interpretation. The inclusion of the actual production numbers and recent history fills in the necessary details, and allows the audience to decide for itself if the change should be considered unusual and important.
‘Defect rate doubled last quarter’
Defect rates are typically estimated by sampling from production throughout the time interval. The defect rate change is given relative to a previous time period, but because defect rates across different production environments vary considerably, it is more critical to understand the true defect rates to assess the practical importance of this change.
If your focus is on the yield of the process, a defect rate change from one in 50 to two in 50 might have a much higher impact than a defect rate change from one in 100,000 to two in 100,000. If your focus is on safety, any change in defect rate might be considered quite important.
Depending on the sampling rate and the cost of testing, the uncertainty associated with the estimates of defect rates can vary substantially. If the point estimate for defect rate doubled but remained within a 95% uncertainty interval for the rate (for instance, 0.002 +/– 0.002 for the previous quarter to 0.004 +/– 0.0025 for the current quarter), it is possible the nature of the sampling procedure might explain a large portion of the observed change.
But if the associated uncertainty is much smaller (for instance, 0.002 +/– 0.0005 for the previous quarter to 0.004 +/– 0.0005 for the current quarter), the observed change in rates is unlikely to be explained by the sampling process and likely is due to a real change in the defect rate. It is also important, however, to consider the practical importance of the observed change.
To clear up this case, show the absolute defect rate with the associated uncertainty of the estimate for the comparison quarter and the new quarter. This helps to calibrate the absolute change and the importance of the change given the intended use of the parts.
In addition, a summary plot of recent trends in the defect rate using a time series plot with included uncertainty will help assess the longer–term trend, as well as the natural fluctuations in estimates given the sampling and testing procedure.
‘A 10% temperature increase gave a 15% yield increase’
The final example illustrates the importance of understanding units and how reporting the absolute numbers, rather then relative change, can improve interpretability. The data that led to this headline originated from a laboratory study in which different production environments were considered.
The default production temperature was 100°F, and it was found that a change to 110°F (the 10% increase) produced the observed increase in yield from 72% to 82.8%.
Given the actual numbers, you could formulate a number of alternative headlines, which all appear to characterize the results but are similarly lacking in real information. You could use degrees Celsius (100°F = 37.8°C and 110°F = 43.3°C, giving a 14.6% increase) or report the defect rate (72% yield↔28% defect rate and 82.8% yield↔17.2% defect rate, giving a 38.6% reduction in defects).
Hence, the same absolute results could translate into any of the following misleading or incomplete headlines:
- A 14.6% increase in temperature (C) gave a 15% increase in yield.
- A 10% increase in temperature (F) gave a 38.6% reduction in defects.
- A 14.6% increase in temperature (C) gave a 38.6% reduction in defects, in addition to the original option.
Clearly, the percentage changes are highly dependent on the summary chosen and give different impressions of the study’s results. There are several other important errors in this headline.
First, the percentage increase of temperature is actually meaningless. Percentages assume the zero on the scale corresponds to something absolute. Here, 0°C or 0°F are relatively arbitrary and do not represent a starting point for the scale against which percentage changes can be sensibly measured.
Perhaps even more misleading is the idea that a temperature change is in any way comparable to a change in yield. It might make sense to compare a change in input costs of production (how much does it cost to raise the temperature from 100°F to 110°F) against change in output yield, but the headline is a classic apples–to–oranges comparison that lacks intrinsic meaning.
Complete and self contained
There is no substitute for providing complete information on the absolute scale—it allows the audience to directly assess the context and importance of the information. Providing a graphical or numerical summary of recent history is also valuable for enhancing the context and incorporating a measure of natural variation. When the quantities of interest are obtained by estimation, the uncertainty associated with this should be included as well.
While the catchy headlines and relative summaries have the opportunity to be attention–grabbers, statisticians and those who report data–based results should resist these tactics and provide a complete self–contained summary that includes all of the key information from which to make an informed decision.
- Gerd Gigerenzer, Calculated Risk: How to Know When Numbers Deceive You, Simon & Schuster, 2002.
- Donald J. Wheeler, Understanding Variation: The Key to Managing Chaos, SPC Press, 2000.
- Edward Tufte, The Visual Display of Quantitative Information, Graphics Press 2001.
- Edward Tufte, Envisioning Information, Graphics Press, 1990.
- Edward Tufte, Visual Explanations: Images and Quantities, Evident and Narrative, Graphics Press, 1997.
- Edward Tufte, Beautiful Evidence, Graphics Press, 2006.
Christine M Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.