Hash Collision Probability (Birthday Problem)
Calculate the probability of hash collisions using the birthday problem formula.
Understand collision resistance with worked examples.
The Birthday Problem Formula
The birthday problem asks: in a group of n people, what is the probability that at least two share the same birthday? The answer is surprisingly high — with just 23 people, there is a 50% chance of a shared birthday among 365 possible days.
This same mathematics applies directly to hash functions. If a hash function produces H possible output values and you hash n items, the probability that at least two items produce the same hash (a collision) follows the same pattern.
Understanding this is critical for security: hash collisions can be exploited to forge digital signatures, create fake certificates, or bypass authentication systems.
Expected Number of Items for 50% Collision
This approximation tells you roughly how many items you need to hash before a collision becomes likely. For a hash with b bits of output, H = 2b, so you need about 2b/2 hashes for a 50% collision chance.
Exact Formula (Birthday Problem)
The exact probability of no collision among n items with H possible values is the product of the probabilities that each successive item avoids all previous values.
Variables
| Symbol | Meaning |
|---|---|
| P | Probability of at least one collision |
| n | Number of items being hashed (or people in the room) |
| H | Number of possible hash values (or days in a year) |
| b | Number of bits in the hash output (H = 2b) |
Example 1: Classic Birthday Problem
In a room of 23 people, what is the probability that at least two share a birthday? (H = 365)
Using the approximation: P ≈ 1 − e−n²/(2H) = 1 − e−23²/(2×365)
= 1 − e−529/730 = 1 − e−0.7247
= 1 − 0.4843
P ≈ 0.516 or about 51.6% (the exact answer is 50.7% — the approximation is very close)
Example 2: Hash Function Collision
A 32-bit hash function produces H = 2³² ≈ 4.29 billion possible values. How many items must be hashed before there is a 50% chance of collision?
n ≈ 1.177 × √H = 1.177 × √(4.29 × 10⁹)
n = 1.177 × 65,536
n ≈ 77,163 items. Despite 4.29 billion possible hashes, collisions become likely after only about 77,000 items. This is why 32-bit hashes are not collision-resistant for large datasets.
Example 3: Cryptographic Hash
SHA-256 produces a 256-bit hash. How many hashes are needed for a 50% collision probability?
n ≈ 2b/2 = 2256/2 = 2128
n ≈ 2¹²⁸ ≈ 3.4 × 10³⁸ hashes. This is an astronomically large number — far beyond what any computer can compute, which is why SHA-256 is considered collision-resistant.
When to Use It
Use the birthday problem / hash collision formula in security and data structure design.
- Choosing appropriate hash function output sizes for security applications
- Estimating collision probability in hash tables and databases
- Understanding brute-force attack complexity against digital signatures
- Designing unique identifier systems (UUIDs, random tokens)
- Analyzing the strength of cryptographic hash functions