Bayes' Theorem

Reference for Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B).
Update probabilities given new evidence.
Foundation of Bayesian inference, machine learning, and medical testing.

The Formula

P(A | B) = P(B | A) × P(A) / P(B)

Bayes' theorem describes how to update the probability of a hypothesis A given new evidence B. The posterior probability P(A | B) is computed from the likelihood P(B | A), the prior probability P(A), and the total probability of the evidence P(B).

Expanded Form

P(A | B) = P(B | A) × P(A) / [P(B | A) × P(A) + P(B | ¬A) × P(¬A)]

The denominator uses the law of total probability to express P(B) in terms of the hypothesis A and its complement. This expanded form is more useful in practice because P(B) is usually not given directly.

Variables

Symbol	Meaning
P(A)	Prior — probability of A before seeing evidence
P(B \| A)	Likelihood — probability of evidence B given that A is true
P(B)	Marginal probability of evidence B (any cause)
P(A \| B)	Posterior — updated probability of A after observing B
¬A	The complement of A (A is false)

Classic Example — Medical Test

A disease affects 1% of the population. A test is 99% sensitive (true positive rate) and 95% specific (true negative rate). If someone tests positive, what is the probability they actually have the disease?

P(Disease) = 0.01, P(No Disease) = 0.99

P(Positive | Disease) = 0.99

P(Positive | No Disease) = 1 − 0.95 = 0.05 (false positive rate)

P(Positive) = 0.99 × 0.01 + 0.05 × 0.99 = 0.0099 + 0.0495 = 0.0594

P(Disease | Positive) = (0.99 × 0.01) / 0.0594

P(Disease | Positive) = 0.167 or about 17%

This is the counterintuitive result that surprises most people: a "99% accurate" test can still be wrong 83% of the time when the underlying condition is rare. The reason is the large pool of false positives drawn from the much larger healthy population.

Example — Spam Email Detection

Suppose 20% of emails are spam. The word "lottery" appears in 60% of spam and 1% of legitimate emails. If an email contains "lottery", what is the probability it is spam?

P(Spam) = 0.20, P(Not Spam) = 0.80

P("lottery" | Spam) = 0.60

P("lottery" | Not Spam) = 0.01

P("lottery") = 0.60 × 0.20 + 0.01 × 0.80 = 0.12 + 0.008 = 0.128

P(Spam | "lottery") = (0.60 × 0.20) / 0.128

P(Spam | "lottery") = 0.9375 or about 94%

This is the core mechanism of a naive Bayes spam classifier — combining many such word-level probabilities to score the overall message.

Intuition — Why Priors Matter

If a hypothesis is rare to begin with, even strong evidence may not lift it past 50%. If a hypothesis is common to begin with, even weak evidence can confirm it. The prior P(A) is not optional — it determines how much the posterior shifts from baseline.

Prior P(A)	P(B\|A)	P(B\|¬A)	Posterior P(A\|B)
0.001	0.99	0.05	0.0194
0.01	0.99	0.05	0.167
0.10	0.99	0.05	0.687
0.50	0.99	0.05	0.952

When to Use It

Medical diagnosis and screening test interpretation
Spam filtering and document classification (naive Bayes)
A/B test analysis and Bayesian inference
Genetic risk assessment given family history
Fault diagnosis in engineering systems given observed symptoms
Legal reasoning about probability of guilt given physical evidence
Search algorithms (Bayesian search for lost objects)

Sequential Updating

Bayes' theorem chains: today's posterior becomes tomorrow's prior. P(A | B₁, B₂) is computed by applying Bayes with P(A | B₁) as the new prior and B₂ as the new evidence. This makes Bayesian updating naturally suited to streaming evidence — each observation refines the estimate without restarting from scratch.

The Common Mistake — Base Rate Neglect

The most common error in informal probabilistic reasoning is ignoring the prior. People presented with the test scenario above often answer "99% chance of disease" because they focus on the test's accuracy and overlook the disease's rarity. Bayes' theorem is the mathematical antidote to this — it forces you to combine the test result with the base rate.