Unraveling Bayes’ Theorem
Understand conditional probability to be more confident when interpreting results
by Christine M. Anderson-Cook
Many of us have a love-hate relationship with conditional probabilities. Yet they are a function of our daily lives. We update our knowledge with new information, and what we believe evolves as new information gets integrated into our thought processes. But the flipside is that formally calculating probabilities using conditional components can be confusing and, if done incorrectly, can be misleading.
I recently had two conversations with friends that reminded me of how tricky conditional probability questions can be. The first involved a colleague getting a positive (bad) result on a mammogram and sorting out what this meant. The second had to do with identifying defective lots from a manufacturing process by testing individual parts in the lot.
Surprisingly, the same issue was at the heart of both situations, and there is a helpful approach to understanding these probabilities that can add confidence to your ability to interpret results.
Let’s start with the manufacturing example and return to the medical issue at the end of the column. Here are basics from the second example: In this manufacturing process, 20 parallel production lines are running simultaneously, and product is packaged into boxes, each with 200 items.
- With the inventory for a week of production sitting in storage, it is discovered that one of the lines has a defective machine, which leads to 60% of the product in a box from that line being defective.
- When everything in the process is working properly, the regular defect rate is 4%.
- Boxes are stored in a warehouse with no identification on which production line it came from.
The goal was to find the boxes with high defect rates before they were shipped to customers. To save testing costs, the proposed strategy was to test a single item from each box and—based on that outcome—decide whether the box was from the defective line. The confusing point arose when trying to evaluate the effectiveness of this strategy. Various people involved in the decision-making process had contradictory answers to the probability of correctly classifying boxes.
Let’s examine this more formally in the context of Bayes’ theorem and conditional probabilities. First, here’s a bit of notation and a more formal definition of the problem:
- Let BD denote that a box is actually from the defective line.
- Let DL denote that we declare a box is from the defective line because the single tested unit failed.
Also, let P(A | B) mean the probability of A given that B is true. Then, correctly classifying the boxes means that you want to find both P(BD | DL) = "the probability that the box is defective if you declare it to be defective," and P(not BD | not DL) = "the probability that a box is not defective if you declare it not defective" to be high.
This might seem like a specialized problem, but you actually encounter variations of it in other segments of life, as you’ll see shortly with the mammogram example.
So how should you evaluate the test procedure? First, you must sort what information is available. Because there are 20 parallel similar lines, statement one above says: P(BD) = 1/20 = 0.05, or about 5% of all the boxes contain too many defective parts. Because 60% of the product from the defective line is defective and we are testing just a single unit from a box, statement two gives P(DL | BD) = 0.6. Similarly, P(DL | not BD) = 0.04 because you are assuming that the regular defect rate of 4% applies to the other production lines.
Bayes’ theorem says we can calculate the probabilities we want with the following formula:
But this feels daunting to many. So an excellent alternative approach using natural frequencies suggested by Gerd Gigerenzer1 is to construct trees that describe the information you have by partitioning an imaginary population into the four possible outcomes of BD/not BD, crossed with DL | not DL. Figure 1 shows the production line problem.
First, take a large number of items (for example, 1,000) to be divided into the different possible outcomes—the first decision is whether the box came from the defective production line. Because 5% of the boxes are from the defective line, 50/1,000 items are put in the left branch of the tree. The remaining 950 = 95% of 1,000 are placed in the right branch (not BD). For each of the branches, we apply the knowledge that we have about the probability of selecting a defective part from a box in that branch. The probability of selecting a defective part from a box from the defective line is 60%. Hence, 30 of the 50 boxes end up in the left-most branch which corresponds to "BD & DL." The remaining 20 boxes under BD are placed in "BD & not DL." Similarly for the not BD branch, with 0.04 * 950 = 38 in the "BD & not DL" and (1 – 0.04) * 950 = 912 in "not BD & not DL."
Now, we can easily answer the questions of interest: First, the probability that you correctly classify a defective box is
in which the denominator came from selecting the two branches of the tree that contained "DL." The probability that you correctly classify a non-defective box is:
Based on this, we would have to say this is not an effective strategy for determining which boxes are defective. We can feel quite confident that a box that you declare nondefective is, in fact, not defective, but there is less than a 50-50 chance of correctly identifying the defective boxes.
So how might we improve the test? One possibility would be to test three parts per box and have your decision rule be to declare the box defective if at least two of the three parts were defective. Note that P(BD) = 0.05 remains unchanged. So, what changes here? Using a binomial probability distribution,2 we have P(DL | BD) = P(all three parts fail) + P(at least two parts fail) = 0.63 + 3(0.6)2(0.4)1 = 0.648, and P(DL | not BD) = P(all three parts fail) + P(at least two parts fail) = 0.043+ 30(0.04)2(0.96)1 = 0.00467. The new frequency tree in Figure 2 shows that:
Now, let’s come full circle to my colleague with the positive mammogram test result. First, we need a bit of background for her case. For illustration, we use the data in Gigerenzer’s example.3 Suppose that eight out of every 1,000 women in my colleagues’ age and risk demographic actually have breast cancer (use P(BC) = 0.008). Next, we need to understand the effectiveness of mammography for diagnosis:
- If a woman has breast cancer, let’s assume that there is a 90% chance of a positive mammogram (in medical terms, this is the sensitivity of the test, which you can write as P(PM | BC) = 0.9).
- If a woman does not have breast cancer, then the probability of a negative test is 0.93 (in medical terms, called the specificity, which you can write as P(not PM | not BC) = 0.93).
On the surface, this seems to be a reasonably good test with correct results the vast majority of the time. Constructing another tree (Figure 3), we find that the answer to the probability of my colleague having breast cancer conditional on having a positive mammogram
leaves considerable doubt about what the conclusion should be.
Clearly, there is a large probability that the mammogram result does not imply what you often assume it does, and the next step should be a follow-up with more testing required.
As an aside, if she had gotten a negative result on the mammogram, then the probability of an error in testing would be
which should be reassuring when the desired test result is obtained.
How did our intuition fail us in these examples? The manipulation of probabilities, particularly conditional ones, is mental gymnastics that few of us can do naturally. By translating the problem to a solution using the natural frequencies approach, the role of the population proportion with good (nondefective or no breast cancer, for these two examples) is elevated.
After the frequency tree is constructed, you are easily able to extract the information that you need to answer a wide variety of questions. Bayes’ theorem is a powerful tool for dealing with conditional probabilities, but sometimes a tree is worth a thousand formulas.
References and note
- Gerd Gigerenzer, Calculated Risk: How to Know When Numbers Deceive You, Simon and Schuster, 2002.
- For more details on the binomial distribution, see Kari Bury, Statistical Distributions for Engineers, Cambridge University Press, 1999, chapter 6, or any introductory statistics textbook.
- Gigerenzer, Calculated Risk: How to Know When Numbers Deceive You, see reference 1, p. 41.
Christine M. Anderson-Cook is a research scientist in the statistical sciences group at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of both ASQ and the American Statistical Association.