The Trusty Jackknife
Method identifies outliers and bias in statistical estimates
by I. Elaine Allen and Christopher A. Seaman
Outliers are a continual source of problems when analyzing data. A few questionable data points can skew your distribution, make significant results seem insignificant and generally ruin your day.
While you can’t simply throw away inconvenient data when it doesn’t support your hypothesis, there is a simple procedure to identify small subsets of data that influence statistical measures. It is called the jackknife.
Initially presented by John W. Tukey in an abstract in the Annals of Mathematical Statistics in 1958,1 the jackknife is a resampling technique that is a special case of the bootstrap.2 A relatively simple and straightforward procedure, it has been widely adopted as an estimator of bias for any statistic and as a way to examine the stability of a variance estimate.
The jackknife can be a useful tool in quality control estimation by identifying outliers and bias in statistical estimates. In this column, the jackknife procedure will be applied to meta-analysis as a way of identifying studies with large influence on the summary effect size estimate.
The jackknife procedure is a simple idea. For any summary statistic, the spread of individual values comprising this statistic can be examined by systematically eliminating each individual observation (or a group of observations) from a dataset, creating a set of "perturbed" summary statistics. The magnitude of the difference between the overall summary statistic and each jackknifed statistic is an estimate of that value’s infl uence on the summary value.
For example, if you have scores of 1, 2 and 3, their mean is 2. The means by selectively eliminating an individual value and averaging the other two values are 1.5 (eliminating 3), 2 (eliminating 2) and 2.5 (eliminating 1). Observations might be considered outliers or points of high influence on the summary statistic when the effect of removing them from the dataset is disproportionately large.
This is a useful and important technique because whenever a statistic is estimated, there is some degree of variability (or error) associated with it. In general, the procedure for performing a jackknife is:
- Given a sample of size n and a sample estimate (for example, µ, the mean), divide the sample into m exhaustive and mutually exclusive subsamples of size h (in many, if not most cases, h will equal 1).
- Drop one subsample from the original sample. Calculate µh - 1, the mean with one sample removed.
- You now have a reduced sample of size (g - 1)*h.
- Calculate the effect of dropping out 1 subsample: biasµ = g* µ −(g - 1)* µh - 1.
- Repeat steps 2 and 3 for all g subsamples, yielding a vector of biasµ values.
- Take the mean of this vector to yield the overall jackknife estimate of µ.
Since the jackknife estimate of µ can be shown to be unbiased, an estimate of the overall bias of the statistic is just the difference between µ and its jackknife estimate.
Applying the jackknife
In meta-analysis, it is the individual study’s effect on the overall effect size that is of interest. It’s important to examine the influence one study can have on the overall outcome and, when that study is removed, whether a signifi cant effect size in one direction becomes insignifi cant or possibly significant in the opposite direction.
Including jackknife estimates in meta-analysis software is becoming standard, and using them as a validity tool has started to be included in results of meta-analyses. The first example is real data summarizing quality-of-life data from a new treatment for cancer. The second example uses some of the studies from the first example but has perturbed others to be more extreme in their study summary statistic or in the size of the variance.
In both cases, the fixed-effects and random-effects models are shown. The difference between these models is that fixed-effects models control for within-study variability but assume that the variability between studies is constant and is not controlled. The random-effects models control the variability within and between studies and are more conservative. This can be seen in both examples, but especially in the second example.
The first example shows how using the jackknife can give assurance that there is no bias introduced by specific studies in the meta-analysis. This is shown in Figures 1 and 2. Figure 1 shows the meta-analysis of eight studies, of which all are relatively consistent in their results, giving an overall effect size that is significant (p-value < 0.001).
Figure 2 displays the results of the jackknife estimates. The first line of Figure 2, Study 1, shows the overall estimate with Study 1 omitted.
Notice how consistent the jackknife estimates are, indicating the effect size estimate is not biased by the influence of any one study. You can conclude from the jackknife analysis that the results are consistent and valid.
The second example is considerably more problematic, as it shows studies that are extremely variable, and the results of the jackknife example give different results depending on the meta-analysis model applied to the studies.
Studies 2 and 6 are widely differing in their study statistics, with Study 2 significantly favoring control and Study 6 significantly favoring treatment (see Figure 3).
The summary statistics for the fixed and random-effects models are inconsistent, with the fixed-effects model significantly favoring treatment and the random-effects model showing no difference between treatment and control.
The conclusions might take several forms, and the jackknife estimates show quite different results.
Given that the results are so extreme, initially returning to the original studies and ensuring the data are correct is important.
Next, examining the inclusion criteria for studies to ensure that they all meet the criteria and trying to identify any moderating variables that might cause such extreme results is important.
Finally, in this case it might not be appropriate to perform a quantitative meta-analysis of these studies given the huge variability between the estimates.
Random-effects jackknife estimates
Eliminating single studies / Online Figure 2
- John W. Tukey, "Bias and Confidence in Not Quite Large Samples," abstract, Annals of Mathematical Statistics, Vol. 29, 1958, p. 614.
- Bradley Efron, The Jackknife, the Bootstrap and Other Resampling Plans, Philadelphia: Society for Industrial and Applied Mathematics, 1982.
Adams, Dean C., Jessica Gurevitch and Michael S. Rosenberg, "Resampling Tests for Meta-Analysis of Ecologic Data," Ecology, Vol. 78, No. 5, 1997, pp. 1,277-1,283.
Baghi, Heibatolla, Siamak Noorbaloochi and Jean B. Moore, "Statistical and Nonstatistical Significance: Implications for Health Care Researchers," Quality Management in Health Care, Vol. 16, No. 2, 2007, pp. 104-112.
Gee, Travis, "The Concept of ‘Gravity’ in Meta-Analysis," Counselling, Psychotherapy, and Health, Vol. 1, No. 1, 2005, pp. 52-75.
The meta-analysis software referenced in this column is Comprehensive Meta-Analysis, version 2.0, 2005. More information can be found at http://meta-analysis.com/index.html.
I. Elaine Allen is professor of statistics and entrepreneurship at Babson College in Wellesley, MA. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.
Christopher A. Seaman is a doctoral student in mathematics at the Graduate Center of City University of New York.