Chi-Squared Distribution
The chi-squared test measures how well observed data fits expected values.
Learn the formula and chi-squared test with worked examples.
The Formula
The chi-squared (χ²) test is a statistical method used to determine whether there is a significant difference between observed frequencies and expected frequencies in categorical data. It was developed by Karl Pearson in 1900 and is one of the most widely used hypothesis tests.
The test statistic χ² measures the total squared deviation between what was observed and what was expected, scaled by the expected value. A large χ² value suggests the data does not fit the expected distribution, while a small value suggests a good fit.
The degrees of freedom (df) depend on the type of test. For a goodness-of-fit test, df = number of categories - 1. For a test of independence, df = (rows - 1) × (columns - 1). You compare the calculated χ² to a critical value from the chi-squared distribution table at your chosen significance level (usually α = 0.05).
Variables
| Symbol | Meaning |
|---|---|
| χ² | Chi-squared test statistic |
| Oᵢ | Observed frequency for category i |
| Eᵢ | Expected frequency for category i |
| df | Degrees of freedom |
| α | Significance level (commonly 0.05) |
Example 1
A die is rolled 60 times. Results: 1→8, 2→12, 3→10, 4→9, 5→11, 6→10. Is it a fair die? (Expected: 10 each)
χ² = (8-10)²/10 + (12-10)²/10 + (10-10)²/10 + (9-10)²/10 + (11-10)²/10 + (10-10)²/10
χ² = 4/10 + 4/10 + 0 + 1/10 + 1/10 + 0 = 0.4 + 0.4 + 0 + 0.1 + 0.1 + 0
df = 6 - 1 = 5. Critical value at α = 0.05 is 11.07
χ² = 1.0. Since 1.0 < 11.07, we fail to reject H₀. The die appears fair.
Example 2
A survey asked 200 people their preferred season. Observed: Spring=60, Summer=70, Autumn=40, Winter=30. Test if preferences are equally distributed.
Expected (equal distribution): 200/4 = 50 for each season
χ² = (60-50)²/50 + (70-50)²/50 + (40-50)²/50 + (30-50)²/50
χ² = 100/50 + 400/50 + 100/50 + 400/50 = 2 + 8 + 2 + 8
df = 3. Critical value at α = 0.05 is 7.815
χ² = 20. Since 20 > 7.815, we reject H₀. Preferences are not equally distributed.
When to Use It
The chi-squared test is ideal for analyzing categorical (non-numerical) data.
- Testing whether a die, coin, or random process is fair
- Checking if survey responses match an expected distribution
- Testing independence between two categorical variables
- Genetic studies testing Mendelian inheritance ratios