Different Roads to Take for Data Analysis
by Christine Anderson-Cook
For most of us with some formal training in statistical methods—from a single course to an advanced degree—the starting point of this part of our education typically began with classical or frequentist methods for analyzing data. Later, we might have been shown a Bayesian analysis for the same problem.
Upon graduation, I sensed these methods were fundamentally different in their approaches to analyzing the data, the assumptions inherent in using the methods and the ways of interpreting the results. This information seemed jumbled and confusing.
I was left with the impression that statisticians and those who use statistical methods face a fork in the road early in their careers—a single irreversible decision that determines the path they take—frequentist or Bayesian—during their statistical years.
A complete comparison between the two approaches is certainly beyond the scope of this Statistics Roundtable column. But highlighting some of the similarities and differences between the frequentist and Bayesian approaches might help more engineers and scientists using statistical methods understand some of the trade-offs.
This will help in deciding which approach is better suited to a given analysis. Indeed, different problems are better suited to one approach or the other. Although there are many who will disagree with me, I believe the choice between frequentist and Bayesian paradigms is one that can be made appropriately on a case-by-case basis.
A useful starting point is to review some of the features common to both approaches:
- Both are parametric approaches with an underlying statistical model with parameters to be estimated.
- Both need to connect the observed data to the parameters through a specified relationship defined by a distribution in the likelihood function.
Recall that the likelihood function is closely related to the probability distribution function and considers the joint probability function of the observed data as a function of the parameters of the selected model, with the goal of finding the best parameter values given the observed data.
Indeed, formulating the statistical model from subject specific knowledge can be completely separated from the choice of analysis approach.
Differences in Approaches
How the models are used for estimation and inference then diverges.
The frequentist approach commonly uses maximum likelihood or least squares estimation to find the choice of the parameter values that makes the observed data most likely under the specified likelihood function.
The Bayesian approach seeks to integrate across the parameter space to combine the prior distribution of the parameters with the information from the data contained in the likelihood function to obtain a posterior distribution of the model parameters.
Other differences include:
Parameters: Fundamentally, the frequentist approach treats the unknown parameters to be estimated as fixed values, which are unknown. Namely, there is a target “correct” value for each parameter, such as the population mean that we seek to find. On the other hand, the Bayesian approach considers the parameters as random variables, which are both unknown and have distributions.
Data and expert opinion: Given the assumed model, the frequentist approach relies on the data as the only source of information to help determine point and interval estimates for the parameter values.
This focus on the data removes subjectivity from the analysis and gives the data maximum impact in determining the most likely values.
Alternatively, the Bayesian approach combines prior distributions of the parameters (which are a subjective assessment based on scientific or engineering knowledge about the parameters before the data are collected) with the added information from the data in the likelihood function to come up with an updated version of the distributions of the parameters based on the combined information of the two sources.
It is possible to use diffuse priors to reflect that not much is known a priori about the parameters, or the prior distribution can be quite narrow if much is already known about model parameters, from theoretical knowledge or previous studies.
The Bayesian approach allows the flexibility to formally incorporate specific knowledge from subject matter experts into the analysis. Depending on the goal of the study, it may be preferable to let the data stand alone with all results dependent on only what has been observed in the current study. Or it may be preferable to combine data and expert knowledge in a structured approach.
If the amount of data is small, then the frequentist and Bayesian approaches can give quite different results. As the amount of observed data grows, the two approaches typically begin to converge at similar results.
When I was taught this distinction, the difference between the two approaches for small to moderate amounts of data was portrayed as a disadvantage of the Bayesian approach. The subjectivity of results based on expert opinion was deemed undesirable.
However, I have now come to appreciate that you want the results to differ depending on the additional information you have added through the prior distribution. If they did not differ, then what would be the benefit of including additional knowledge from previous studies or expertise?
If there is some uncertainty or disagreement among the experts, a sensitivity analysis of different prior distributions can be considered to understand the influence of various changes to the prior distributions.
If there is only a small amount of data, then the frequentist approach can lead to very wide confidence intervals for the parameter estimates.
If there is good knowledge about the model parameters, the increase in precision of the estimates from incorporating that information into the prior distribution may be very beneficial.
As you might expect, if the knowledge added through the prior distribution turns out to be incorrect, then the resulting posterior distribution can be biased. Hence, it is important to incorporate only additional information that has some firm subject matter basis.
Another important difference between the two approaches is the interpretation of the interval estimates obtained for the unknown parameters.
Since the frequentist approach is trying to estimate a fixed but unknown constant, a 95% confidence interval should be interpreted as range of sensible values that under repeated sampling or running of the experiment would include the true parameter value 95% of the time. Thus the notion of the experiment being repeatable is intrinsic to the interpretation.
The Bayesian 95% credible interval for the unknown parameter (thought of as a random variable itself) gives the range of values for which there is a 95% probability (given the ob-served data and any prior knowledge included) of including the parameter value. This approach is natural for situations in which the experiment is not repeatable, and it is the commonly given, although erroneous, interpretation of what a confidence interval for the frequentist approach is providing.
The Bayesian approach is considerably more flexible for obtaining interval estimates for nonstandard questions.
For example, we might have a model for the data for which the model parameters are the mean and variance but be interested in estimating certain percentiles of the distribution of the observed responses, or a complicated expression in terms of several model parameters.
For the frequentist approach, typically these answers cannot be obtained directly from the parameter estimated values. However, this estimation poses little problem for the Bayesian approach.
Finally, one of the prominent features in the comparison of frequentist and Bayesian approaches historically used to be the computational intensity of the two approaches. Just 10 years ago, the implementation of the Bayesian approach for many problems was difficult, cumbersome and computationally very demanding.
However, with increased computing power and software such as WinBugs (www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml), this is no longer such a large consideration. While there is still more software available for frequentist analyses, the gap is closing quickly and this is a much less important consideration that it once was when choosing between the two approaches.
I would encourage those using statistical methods to be open and consider both the frequentist and Bayesian approaches.
The frequentist approach can be a solid choice for estimating many functions of the model parameters in cases in which no previous knowledge is available about the model parameters or there is considerable uncertainty about what is known.
The Bayesian approach can be a practical and beneficial choice to consider when there is subject matter expert knowledge about the model parameters, the experiment does not lend itself to the frequentist interpretation of repeatedly collecting the data, or a nonstandard function of the model parameters is of interest.
- Gelman, Andrew, John B. Carlin, Hal S. Stern and Donald B. Rubin, Bayesian Data Analysis, CRC Press, 2004.
- Lee, Peter M., Bayesian Statistics: An Introduction, Oxford University Press, 1997.
CHRISTINE ANDERSON-COOK is a technical staff member of Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario, Canada. Anderson-Cook is a senior member of ASQ.