Cumulative Meta-Analysis

by I. Elaine Allen and Christopher A. Seaman

Meta-analysis is a set of statistical procedures designed to integrate and synthesize experimental results across independent studies into an overall summary statistic. Unlike traditional research methods, meta-analysis uses the summary statistics from individual studies as the data points.

Mostly used in education, psychology and medicine, meta-analysis can also be applied to quality control.1, 2 Published studies are most often used in meta-analyses, but the methodology can also be applied to internal studies. And though meta-analyses are typically used to compare two treatments (or a new treatment with a control), they can also be used to examine two processes or a standard and improved product.

A key assumption of meta-analysis is that each study provides an independent estimate of the underlying relationship within an unknown—and probably unknowable—population. Accumulating results across studies, meta-analysis offers new insights about the population and studies. It allows researchers to:

  • Gain more statistical power as similar results from different studies are combined.
  • Provide a more accurate representation of the population relationship than is provided by the individual study estimators.
  • Cumulatively combine studies chronologically to identify when a characteristic or statistically significant change first occurs.
  • Understand the heterogeneity of the process or outcome being studied.

We will focus on the use of cumulative meta-analysis and an application in product improvement. Cumulative meta-analysis can also be directly applied in manufacturing to identify the time a process significantly changed or to compare old and new processes or products by synthesizing information from multiple experiments.

In cumulative meta-analysis the experiments are accumulated from the earliest to the latest, where each successive experiment includes a synthesis of all previous experiments. This chronological combining of the experiments will show if there is a consistency in the results of consecutive experiments and indicate the point at which no further experiments are necessary because the results continually favor one process, product or treatment.

Four Steps

A meta-analysis is a multistage process involving protocol, study identification, data extraction and synthesis:3

  1. Protocol: Not every study is relevant to the question at hand, so the first step is to specify the criteria for identifying suitable studies or experiments for the meta-analysis. A prospectively defined protocol for the meta-analysis specifying criteria for inclusion and data being extracted is essential. These criteria should be operationally defined and rigidly decisive in triaging experiments to be included or excluded from analysis. The criteria should specify the types of test and control conditions as well as which reported outcomes each study must have. This step is somewhat simpler in quality control applications because tests of a new product or process are all being conducted under similar test conditions.
  2. Study identification: The next step is to apply the criteria as a filter to find the studies needed. In clinical meta-analyses, this involves exhaustive searching of the literature for any and all published studies that meet the protocol defined criteria in the first stage. In a quality control application, however, all previous studies would already be housed and indexed internally.
  3. Data extraction: Each study or experiment that reaches this point should have relevant data to be extracted. So the next step is to calculate a result (usually called the effect size, point estimate or summary statistic) with an accompanying estimate of the variation the researchers would expect with studies of this type (the standard deviation, confidence interval or range).
  4. Synthesis: First, determine whether it is appropriate to calculate a synthesized average result across studies. If so, then calculate and present such a result. The type and calculation of this summary statistic depends on the type of data available (discrete vs. continuous variables) and whether it is a comparative synthesis.

Not every study in a meta-analysis is equally important. Studies that give more information by using a larger sample size or a smaller degree of variability should be given more credibility because their results are likely to be closer to the truth you are trying to estimate. The results of meta-analyses are often presented in a forest plot giving the point estimate and confidence interval for each study as well as a summary point estimate and summary confidence bound.

Meta-analysis is not a simple pooling of the data from multiple studies as if they were one large study. Simple aggregation results are usually incorrect. Instead, a meta-analysis looks at the results, sample size and variability within each study and then calculates a weighted average summary effect size.

Two Types of Analysis

The two most common types of meta-analysis models are fixed effects models (FEMs)4 and random effects models (REMs).5 The Mantel-Haenszel FEM computes the effect estimate as a weighted average of the individual study estimates, each weighted by the inverse of the study variance. The equation for computing the effect estimate for the FEM is μ = Σ yiwi / Σ wI, where wi = 1 / σi2.

By assuming the experiments are homogeneous and the estimate follows a normal distribution, a confidence interval can be computed in the usual manner with the standard error of the weighted average using the equation SE = 1 / √Σwi. The REM assumes each study has its own mean, μi, and variance, σi2 but the μi are drawn from a superpopulation of effects each with its own mean, μ, and variance, τ2, which describe the between-study heterogeneity.

As in the FEM, μ in the REM is estimated by a weighted average of the study effects, but the weights, 1/(τ2 + σi2), are the inverses of the sums of the within-study variances, σi2, and the between-study variance, τ2. When τ2 = 0 so the treatment effects, μi, are all the same, it reduces to become the FEM. The DerSimonian and Laird REM estimate of τ2 is used in the weighting formula. The equation for computing the effect estimate for REM is μ = Σ yiwi / Σ wi, where wi = 1 / (τ2 + σi2).

Because of the different weighting, experiments with a larger sample size have less effect on the REM estimate than on the FEM estimate. Confidence intervals are generally wider in the REM, and because the inclusion of τ2 accounts for the nonrandom variability between studies, the REM gives more conservative estimates of variance.

The equation of standard error for computing the confidence interval is SE = (Σ (D + wi - 1) - 1) - 1 / 2, where D denotes the variance of each experiment effect size. The REM usually provides a more conservative estimate and is particularly useful in checking the robustness of a significant result obtained using FEMs. FEMs assume any variability between results of experiments is completely random error, while REMs assume there may be experiment specific errors. That’s why we recommend and use REMs for all meta-analyses.


These data come from a consumer products company that introduced an improved version of an existing product. The primary objective of the project was to identify the degree of superiority of the new product by meta-analyzing the data from all experiments run by the company. During the course of the project, the company realized that if it had used meta-analysis as an ongoing step in the product’s development, it could have shown the superiority of its new formulation sooner and launched the product a year earlier.

The company wanted to compare the new formulation (New) with its existing standard formulation (Standard) to ensure New’s superiority prior to market launch. Prior to introducing New, the company conducted more than 200 blind, controlled internal experiments compar- ing Standard to New to validate
its claims. After performing these experiments, the company concluded New was significantly better than Standard.

Using cumulative meta-analysis methods, the company could have stopped after only 20 experiments and launched the product earlier because the results of the cumulative meta-analysis revealed the significant difference between Standard and New was consistent for the 180-plus experiments conducted after that point. Each experiment involved the comparison between Standard and New, where the difference between them would be negative if the New formulation was superior to the Standard formulation.

This is illustrated in Tables 1 and 2, which show only the first 25 experiments. Table 1 shows the meta-analysis of each study separately, and Table 2 shows the effect of accumulating the results over studies and time. If only Table 1 is viewed, it is clear from the line labeled “Random” that New is superior but not whether it is consistently superior over consecutive experiments. Only in Table 2 does this become clear because the I-bars lie completely in the negative after 20 experiments, indicating a significant difference.6

Table 1

Table 2

The primary goal of the process was to identify a significant change for which using noncumulative meta-analysis would suffice. However, using cumulative meta-analysis changed the way the company planned and analyzed experiments on new and standard products from that point forward.


  1. Matthias Egger, George D. Smith and Douglas G. Altman, editors, Systematic Reviews in Health Care: Meta-Analysis in Context, BMJ Books, 2001.
  2. Ralf Schulze, Heinz Holling and Dankmar Bohning, editors, Meta-Analysis: New Developments and Applications in Medical and Social Sciences, Hogrefe & Huber, 2003.
  3. The Cochrane Collaboration, Cochrane Reviewers’ Handbook 4.2.2, 2004,www.cochrane.org/ resources/handbook/.
  4. N. Mantel and W. Haenszel, “Statistical Aspects of the Analysis of Data From Retrospec-tive Studies of Disease,” Journal of the National Cancer Institute, Vol. 51, No. 22, pp. 19-48.
  5. Rebecca DerSimonian and Nan Laird, “Meta-Analysis in Clinical Trials,” Controlled Clinical Trials, Vol. 7, No. 3, pp. 177-188.
  6. All calculations were conducted in Comprehensive Meta-Analysis Version 2.0, 2005, http://meta-analysis.com/index.html.

I. ELAINE ALLEN is professor of statistics and entrepreneurship at Babson College in Wellesley, MA. She earned a doctorate in statistics from Cornell University in Ithaca, NY and is a member of ASQ.

CHRISTOPHER A. SEAMAN is a statistical researcher at Human Services Research Institute in Cambridge, MA.

Average Rating


Out of 0 Ratings
Rate this article

Add Comments

View comments
Comments FAQ

Featured advertisers