Correlation Coefficient Formula
Calculate the Pearson correlation coefficient r to measure the strength and direction of a linear relationship between two variables.
The Formula
The Pearson correlation coefficient (r) measures the strength and direction of a linear relationship between two variables. Its value ranges from -1 to +1.
Variables
| Symbol | Meaning |
|---|---|
| r | Correlation coefficient (-1 to +1) |
| x, y | Individual data values for the two variables |
| x̄, ȳ | Means of x and y respectively |
Interpreting r
- r = +1 — Perfect positive correlation (as x increases, y increases)
- r = 0 — No linear correlation
- r = -1 — Perfect negative correlation (as x increases, y decreases)
- |r| > 0.7 — Strong correlation
- 0.3 < |r| < 0.7 — Moderate correlation
- |r| < 0.3 — Weak correlation
Example 1
Hours studied (x): 1, 2, 3, 4, 5 — Test scores (y): 50, 55, 65, 70, 80
Step 1: x̄ = (1+2+3+4+5)/5 = 3, ȳ = (50+55+65+70+80)/5 = 64
Step 2: Calculate (x - x̄)(y - ȳ) for each pair:
(1-3)(50-64) = 28, (2-3)(55-64) = 9, (3-3)(65-64) = 0, (4-3)(70-64) = 6, (5-3)(80-64) = 32
Step 3: Σ((x-x̄)(y-ȳ)) = 28 + 9 + 0 + 6 + 32 = 75
Step 4: Σ(x-x̄)² = 4+1+0+1+4 = 10, Σ(y-ȳ)² = 196+81+1+36+256 = 570
Step 5: r = 75 / √(10 × 570) = 75 / √5700 = 75 / 75.50
r = 0.993 — Very strong positive correlation. More study hours are strongly linked to higher scores.
Example 2
Temperature °F (x): 60, 70, 80, 90, 100 — Hot cocoa sales (y): 50, 40, 25, 15, 5
Step 1: x̄ = 80, ȳ = 27
Step 2: (x-x̄)(y-ȳ):
(60-80)(50-27) = -460, (70-80)(40-27) = -130, (80-80)(25-27) = 0, (90-80)(15-27) = -120, (100-80)(5-27) = -440
Step 3: Σ = -460 + (-130) + 0 + (-120) + (-440) = -1150
Step 4: Σ(x-x̄)² = 400+100+0+100+400 = 1000, Σ(y-ȳ)² = 529+169+4+144+484 = 1330
Step 5: r = -1150 / √(1000 × 1330) = -1150 / √1330000 = -1150 / 1153.26
r = -0.997 — Very strong negative correlation. Higher temperatures mean fewer cocoa sales.
When to Use It
Use the correlation coefficient when:
- You want to measure how strongly two variables are related
- Checking if a linear model is appropriate for your data
- Exploring relationships in research data before running regression
- Comparing the strength of relationships across different variable pairs