ONE GOOD IDEA
Put to the Test
Formulas help determine data’s statistical significance
by Tony Gojanovic
Count data are everywhere, and they form the basis of many decision-making statistics. For example, you might have a count of a product’s nonconformances for a particular time or a count of hospital beds that are occupied at a hospital each day. Organizations are often interested in developing a measure of precision around their count estimates for comparative purposes.
Count data are never less than zero and can be without upper limits. They are typically defined by the Poisson distribution, and most statistics books provide a formula for a typical confidence interval based on the actual count plus or minus some error term, which also is based on the count.
An alternate formula can be developed using an approximation to the normal distribution. Specifically, an approximate formula for developing a 95% confidence interval for count data can be shown as:
Lower bound =
Upper bound =
In this example, x is the count, and the 1.96 factor provides the 95% interval width. The 95% refers to a 0.95 probability that the constructed confidence interval captures the true count. The formula works well when the counts are not close to zero.
Problem: In the city and county of Denver, police reported 2,925 crimes from January to May 2012. During the same timeframe in 2013, there were 2,796 crimes reported. The 2013 count represents a 4% reduction in the crime rate. Would you be able to say this is a statistically significant drop?
Solution: Using the formulas above to calculate 95% confidence intervals for 2012 data, the confidence interval for the true incident count lies in the interval of 2,820 to 3,032. For 2013 data, the confidence interval was 2,694 to 2,901 incidents. Even though there were 129 fewer crimes committed in 2013, the highly overlapping intervals suggest that there was not a statistically significant shift in the crime rate.
Comparisons using confidence intervals are just a quick exploratory check. Overlapping intervals do not always indicate lack of significance because they can overlap to some extent, even up to 25%, and still be significant.
Problem: As a way to increase public safety in a large metropolitan area, photo-enforcement cameras were posted on traffic lights, and the duration of yellow lights was lengthened at four intersections. Right-angle accidents for cars were recorded for 30 months prior to the new safety measures, and 69 total incidents at the four intersections were counted.
In the 30 months after the new measures were implemented, 21 right-angle accidents were recorded from all four intersections. Was there a statistically significant reduction in accidents?
Solution: Using the formulas noted earlier, the 95% confidence interval for right-angle accidents prior to using the risk-mitigation measures was 54 to 87 incidents. Post-risk mitigation shows an interval of 13 to 31 incidents.
Yes, there is a statistical difference in right-angle accidents due to the wide separation between intervals. But it cannot be independently determined whether photo enforcement or yellow-light duration had a greater effect on public safety. Although the results are promising, more investigation would be required, including a determination as to whether overall city accidents were lower.
- Department of Safety Public Information Standards, "Reported Offenses Using NIBRS Definitions in the City and County of Denver," report, Denvergov.org, http://tinyurl.com/denvercrimestatistics.
- Freeman, Murray F., and John W. Tukey, "Transformations Related to the Angular and the Square Root," The Annals of Mathematical Statistics, Vol. 21, No. 4, 1950, pp. 607-611.
- Gallagher, Dennis J., "Denver Photo Enforcement Program: Performance Audit," report, Office of the Auditor—Audit Services Division, December 2011, p. 24.
- Hoaglin, David C., Frederick Mosteller and John W. Tukey, Exploring Data Tables, Trends and Shapes, John Wiley and Sons, 1985, p. 408.
- Van Belle, Gerald, Statistical Rules of Thumb, John Wiley and Sons, 2002, p. 40.
Tony Gojanovic is a statistician at MillerCoors in Golden, CO. He has a master’s degree in statistics and computer science from the University of Colorado in Denver and is a member of ASQ and the American Statistical Association.