Chi-Square Test Formula
The chi-square test statistic χ² = Σ(O−E)²/E measures how observed frequencies differ from expected frequencies in categorical data.
The Formula
The chi-square test compares observed data to what you would expect under a specific hypothesis. If the observed values differ significantly from the expected values, the chi-square statistic will be large.
This test is used for categorical (count) data, not continuous measurements. The degrees of freedom determine which chi-square distribution to compare against.
Variables
| Symbol | Meaning |
|---|---|
| χ² | Chi-square test statistic |
| O | Observed frequency (the actual count in each category) |
| E | Expected frequency (the count predicted by the hypothesis) |
| Σ | Sum across all categories |
| df | Degrees of freedom (number of categories minus 1 for goodness of fit) |
Common Critical Values (α = 0.05)
| df | Critical Value |
|---|---|
| 1 | 3.841 |
| 2 | 5.991 |
| 3 | 7.815 |
| 4 | 9.488 |
| 5 | 11.070 |
| 10 | 18.307 |
Example 1
A die is rolled 60 times. You expect each face to appear 10 times. The observed counts are: 8, 12, 11, 7, 13, 9. Is the die fair at the 0.05 significance level?
Expected count for each face: E = 60/6 = 10
χ² = (8−10)²/10 + (12−10)²/10 + (11−10)²/10 + (7−10)²/10 + (13−10)²/10 + (9−10)²/10
χ² = 4/10 + 4/10 + 1/10 + 9/10 + 9/10 + 1/10
χ² = 0.4 + 0.4 + 0.1 + 0.9 + 0.9 + 0.1 = 2.8
df = 6 − 1 = 5; critical value at α = 0.05 is 11.070
χ² = 2.8 < 11.070, so we fail to reject the null hypothesis. The die appears fair.
Example 2
A survey asks 200 people their preferred season. Expected: equal preference (50 each). Observed: Spring 65, Summer 55, Autumn 45, Winter 35. Is there a significant preference?
E = 200/4 = 50 for each season
χ² = (65−50)²/50 + (55−50)²/50 + (45−50)²/50 + (35−50)²/50
χ² = 225/50 + 25/50 + 25/50 + 225/50
χ² = 4.5 + 0.5 + 0.5 + 4.5 = 10.0
df = 4 − 1 = 3; critical value at α = 0.05 is 7.815
χ² = 10.0 > 7.815, so we reject the null hypothesis. There is a significant seasonal preference.
When to Use It
The chi-square test applies to categorical data where you compare counts to expected values.
- Goodness of fit: does observed data match a theoretical distribution?
- Test of independence: are two categorical variables related?
- Genetics: testing Mendelian ratios (e.g., 3:1 or 9:3:3:1)
- Market research: comparing preferences across groups
- Quality control: checking if defect rates differ by production line
Important: each expected count should be at least 5 for the chi-square approximation to be reliable.