Download the Article (PDF, 95 KB)
Keith M. Bower, Minitab Inc.
The chi-square test isnt always the best choice.
Six Sigma practitioners occasionally conduct studies to assess differences between items such as operators or machines. When the experimental data are measured on a continuous scale (measuring nozzle diameter in microns, for example), a procedure such as Students two-sample t-test may be appropriate.(1) When the response variable is recorded using counts, however, Karl Pearsons test may be employed.(2)
But when the number of observations obtained for analysis is small, the test may produce misleading results. A more appropriate form of analysis (when presented with a 2 * 2 contingency table) is to use R.A. Fishers exact test.
Example
On the Late Show With David Letterman, the host (David) and the
shows musical director (Paul Shaffer) frequently assess whether
particular items will or will not float when placed in a tank of water.
Lets assume Letterman guessed correctly for eight of nine items,
and Shaffer guessed correctly for only four items. Lets also assume
all the items have the same probability of being guessed.

You
would typically use the
test
when presented with the contingency table results in Figure
1. In this case, the
test
assesses what the expected frequencies would be if the null hypothesis
(equal proportions) was true. For example, if there was no difference
between Lettermans and Shaffers guesses, you would expect
Letterman to have been correct six times (see Figure
2). This is calculated as (9 * 12) / 18 = 108 / 18 = 6.
The resulting p-value, 0.046, from the
test
indicates there is a statistically significant difference (at the
=
0.05 level) in the success rates of Letterman and Shaffer.
As Fisher discusses, however, The treatment of frequencies by means
of
is an approximation,
which is useful for the comparative simplicity of the calculations. The
exact treatment is somewhat more laborious, though necessary in cases
of doubt.(3)
Some practitioners will experience a problem when an expected value is less than five (this is what Fisher alludes to in his statement of doubt). Sometimes its appropriate to group certain categories to avoid the problem, but this is clearly not possible when there are only two categories. As shown in Figure 2, there are two cells in which the expected counts are less than five.
Fishers exact test considers all the possible cell combinations that would still result in the marginal frequencies as highlighted (namely 9, 9 and 12, 6). The test is exact because it uses the exact hypergeometric distribution rather than the approximate chi-square distribution to compute the p-value.
The resulting p-value using Fishers exact test is 0.1312. Therefore,
you would fail to reject the null hypothesis of equal proportions at the
= 0.05 level. This
contradicts the results from the
test
and indicates the
test provided a poor approximation to the exact results.
The computations involved in Fishers exact test may be extremely time consuming to calculate by hand, but are in the sidebar Calculations for Fishers Exact Test for illustration.(4) Clearly, its much easier to use a statistical software package to obtain these results.
Implications
Its appropriate to use Fishers exact test particularly when
dealing with small counts. The
test
is basically an approximation of the results from the exact test, so erroneous
results could potentially be obtained from the few observations. This
could lead to incorrect conclusions in Six Sigma projects.
References
Bibliography
Fleiss, Joseph L., Statistical Methods for Rates and Proportions,
John Wiley and Sons, 1981.
Montgomery, Douglas C., and George C. Runger, Applied Statistics and
Probability for Engineers, second edition, John Wiley and Sons, 1999.
The
hypergeometric probability distribution is used to compute the probability
of the observed results (see Table 1).
The
remaining tables that will be consistent with the marginal frequencies
of 9, 9 and 12, 6, along with their associated probabilities, are shown
in Table 2.
To compute Fishers exact test results, look at the tables with probabilities less than or equal to the probability of the observed results (0.061085972). They are highlighted with a . Add these probabilities together, along with the probability of the observed results, to obtain the p-value for the test.
This particular p-value is 0.13122.