Bayes' Theorem
Reference for Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B).
Update probabilities given new evidence.
Foundation of Bayesian inference, machine learning, and medical testing.
The Formula
Bayes' theorem describes how to update the probability of a hypothesis A given new evidence B. The posterior probability P(A | B) is computed from the likelihood P(B | A), the prior probability P(A), and the total probability of the evidence P(B).
Expanded Form
The denominator uses the law of total probability to express P(B) in terms of the hypothesis A and its complement. This expanded form is more useful in practice because P(B) is usually not given directly.
Variables
| Symbol | Meaning |
|---|---|
| P(A) | Prior — probability of A before seeing evidence |
| P(B | A) | Likelihood — probability of evidence B given that A is true |
| P(B) | Marginal probability of evidence B (any cause) |
| P(A | B) | Posterior — updated probability of A after observing B |
| ¬A | The complement of A (A is false) |
Classic Example — Medical Test
A disease affects 1% of the population. A test is 99% sensitive (true positive rate) and 95% specific (true negative rate). If someone tests positive, what is the probability they actually have the disease?
P(Disease) = 0.01, P(No Disease) = 0.99
P(Positive | Disease) = 0.99
P(Positive | No Disease) = 1 − 0.95 = 0.05 (false positive rate)
P(Positive) = 0.99 × 0.01 + 0.05 × 0.99 = 0.0099 + 0.0495 = 0.0594
P(Disease | Positive) = (0.99 × 0.01) / 0.0594
P(Disease | Positive) = 0.167 or about 17%
This is the counterintuitive result that surprises most people: a "99% accurate" test can still be wrong 83% of the time when the underlying condition is rare. The reason is the large pool of false positives drawn from the much larger healthy population.
Example — Spam Email Detection
Suppose 20% of emails are spam. The word "lottery" appears in 60% of spam and 1% of legitimate emails. If an email contains "lottery", what is the probability it is spam?
P(Spam) = 0.20, P(Not Spam) = 0.80
P("lottery" | Spam) = 0.60
P("lottery" | Not Spam) = 0.01
P("lottery") = 0.60 × 0.20 + 0.01 × 0.80 = 0.12 + 0.008 = 0.128
P(Spam | "lottery") = (0.60 × 0.20) / 0.128
P(Spam | "lottery") = 0.9375 or about 94%
This is the core mechanism of a naive Bayes spam classifier — combining many such word-level probabilities to score the overall message.
Intuition — Why Priors Matter
If a hypothesis is rare to begin with, even strong evidence may not lift it past 50%. If a hypothesis is common to begin with, even weak evidence can confirm it. The prior P(A) is not optional — it determines how much the posterior shifts from baseline.
| Prior P(A) | P(B|A) | P(B|¬A) | Posterior P(A|B) |
|---|---|---|---|
| 0.001 | 0.99 | 0.05 | 0.0194 |
| 0.01 | 0.99 | 0.05 | 0.167 |
| 0.10 | 0.99 | 0.05 | 0.687 |
| 0.50 | 0.99 | 0.05 | 0.952 |
When to Use It
- Medical diagnosis and screening test interpretation
- Spam filtering and document classification (naive Bayes)
- A/B test analysis and Bayesian inference
- Genetic risk assessment given family history
- Fault diagnosis in engineering systems given observed symptoms
- Legal reasoning about probability of guilt given physical evidence
- Search algorithms (Bayesian search for lost objects)
Sequential Updating
Bayes' theorem chains: today's posterior becomes tomorrow's prior. P(A | B₁, B₂) is computed by applying Bayes with P(A | B₁) as the new prior and B₂ as the new evidence. This makes Bayesian updating naturally suited to streaming evidence — each observation refines the estimate without restarting from scratch.
The Common Mistake — Base Rate Neglect
The most common error in informal probabilistic reasoning is ignoring the prior. People presented with the test scenario above often answer "99% chance of disease" because they focus on the test's accuracy and overlook the disease's rarity. Bayes' theorem is the mathematical antidote to this — it forces you to combine the test result with the base rate.