Match Game

Propensity scoring can help infer patient response in the real world

The gold standard for evaluating new medical treatments, devices or services is the randomized controlled trial. We randomize to ensure the subjects or patients who receive different treatments (or a placebo) in the trial are comparable.

However, more researchers are interested in the effect of a treatment or service in situations in which randomization is difficult or impossible. In addition, researchers are interested in examining how the control (or alternate therapy) group would have responded to the treatment or service.

One useful method involves propensity score methods—matching members of different groups based on a range of characteristics and forming a probability score. Under certain assumptions, comparison of the matched groups can reveal the impact of the treatment or service in each group. Why estimate the probability that a subject received a certain treatment when it is already known what treatment they received?

By using the probability that one subject would have been treated (the propensity score) to adjust the estimate of the treatment effect, we create a quasi-experiment.

When we find two subjects with the same propensity score—one treated and one a control—we can think of these two subjects as being randomly assigned to each group because they have the same probability of being in either group, given their covariates.

In 1983, P.R. Rosenbaum and Donald B. Rubin published a paper proposing this approach.1 They followed this study with additional papers expanding their methods. Recently, Onur Baser reviewed multiple matching procedures and appropriate uses for propensity scores.2

While primarily developed for health-related clinical trials, these methods have been extended to economic3 and agricultural settings4 and can be used for any experiment in which different changes are being applied to a large sampling pool.

Propensity scoring

The basic idea behind propensity score methods is to:

  • Use standard logistic regression to estimate the probability of exposure for each subject in the data.
  • Construct sets of subjects with similar propensity score estimates in hopes that these groups are interchangeable in every way but the exposure itself.
  • Estimate the causal effects of exposure by a conditional logistic regression, for example contrasting people actually exposed and those not actually exposed.

The propensity score is defined as the conditional probability of a certain treatment given background variables and covariates: p(x) = Pr (Y = 1|X = x),

in which p(x) represents the predicted propensity score and the probability of being in treatment group one, given covariate and characteristic effects X = x. Calculate this value by running a logistic regression with a binary dependent variable: Y = 1 if treated; Y = 0 otherwise.

The model includes all important subject and treatment characteristics and covariates. The propensity score is obtained by examining the predicted probability (p) or the log (p / (1 – p)).

Then each treated subject is matched to the nearest subject in the untreated or control group by the closest matching propensity score.

Different measures of distance between subjects can be used—for example, Euclidean distance, Mahalanobis distance, Lorentzian distance or stratification.5

Following this matching procedure, the new sample of matched subjects can be used to examine treatment effects with multivariate techniques.

Two approaches might be used for the resulting analyses:

  • A subgroup of well-matched subjects and controls can be analyzed as a separate cohort to create a quasi-experiment.
  • Weights can be created from propensity scores to adjust the subject and control groups and compensate for differences.

Steps in propensity scoring

Essentially, there are four steps in using the propensity scoring method:

  • Step one: Estimate propensity score using key covariates/characteristics in a logistic model.
  • Step two: Choose matching algorithm that provides “best” matching—in other words, a matching algorithm that retains most cases or provides the best likelihood of good matches.
  • Step three: Check overlap of matched groups with original experimental data.
  • Step four: Discard unmatched data and keep matched pairs.
  • Step five: Analyze resulting dataset to estimate matching effect vs. nonmatching effect.

Figure 1 gives an example of aligning both treatment groups by propensity score showing those subjects that are matched and those not matched.

Figure 1

Not always the answer

Propensity score matching is not a panacea to correct deficiencies in experimental design. The benefits lie not only with the potentially improved effect estimation, but also in evaluating serious biases between treatment groups.

Creating matched samples and predicting exposure is a far better technique than directly modeling outcomes without matching. These methods also permit the use of diagnostics to examine the sensitivity of estimates as well as results for unmatched and matched groups.

There are some limitations to propensity score matching. Unlike randomization, propensity scores might not balance unobserved covariates, so bias might remain in the treatment estimation.

Also, if the propensity score estimation omits an important covariate, you will not know it, and the results might be biased. Finally, if there is limited overlap in the characteristics of the treatment groups based on the covariate analyses, significant portions of the dataset will be eliminated.

Overall, propensity score matching provides a unique method for comparing groups when randomization is not possible.


  1. P.R. Rosenbaum and Donald B. Rubin, “The Central Role of Propensity Score in Observational Studies for Causal Effects,” Biometrika, Vol. 70, 1983, pp. 41-55.
  2. Onur Baser, “Too Much Ado about Propensity Score Models? Comparing Methods of Propensity Score Matching,” Value in Health, Vol. 6, No. 9, 2006, pp. 377-385.
  3. Rajeev H. Dehejia and Sadek Wahba, “Propensity Score-Matching Methods for Nonexperimental Causal Studies,” The Review of Economics and Statistics, Vol. 84, No. 1, 2002, pp. 151-161.
  4. Lori Lynch, Wayne Gray and Jacqueline Geoghegan, “Are Farmland Preservation Program Easement Restrictions Capitalized Into Farmland Prices? What Can a Propensity Score Matching Analysis Tell Us?” Review of Agricultural Economics, Vol. 29, No. 3, 2007, pp. 502-509.
  5. Baser, “Too Much Ado About Propensity Score Models? Comparing Methods of Propensity Score Matching,” see reference 2.

I. ELAINE ALLEN is research director of the Arthur M. Blank Center for Entrepreneurship and professor of statistics and entrepreneurship at Babson College in Wellesley, MA. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.

CHRISTOPHER A. SEAMAN is a doctoral student in mathematics at the Graduate Center of City University of New York.

Average Rating


Out of 0 Ratings
Rate this article

Add Comments

View comments
Comments FAQ

Featured advertisers