Spearman Rank Correlation Formula
Spearman's rank correlation measures monotonic relationships between two variables using ranks instead of raw values.
Non-parametric alternative to Pearson correlation.
The Formula
Spearman's rank correlation coefficient (r_s) measures the strength and direction of the monotonic relationship between two ranked variables. Instead of using raw values, it converts each value to its rank and then computes the correlation of the ranks. The result ranges from −1 (perfect inverse monotonic relationship) to +1 (perfect monotonic relationship), with 0 meaning no monotonic correlation.
Variables
| Symbol | Meaning |
|---|---|
| r_s | Spearman rank correlation coefficient (−1 to +1) |
| d | Difference between ranks for each pair: d_i = rank(x_i) − rank(y_i) |
| n | Number of data pairs |
When to use Spearman vs. Pearson:
- Ordinal data (rankings, survey scales) — use Spearman
- Data with outliers — Spearman is robust; Pearson is sensitive
- Non-linear but monotonic relationships — use Spearman
- Approximately linear relationship, normally distributed — use Pearson
Example — Exam Rank Correlation
5 students ranked in a math exam (X) and a physics exam (Y). Are the rankings correlated?
Student ranks (X, Y): (1,1), (2,3), (3,2), (4,5), (5,4)
Differences d: 0, −1, 1, −1, 1
d²: 0, 1, 1, 1, 1 → Σd² = 4
r_s = 1 − (6 × 4) / (5 × (25 − 1)) = 1 − 24/120 = 1 − 0.2
r_s = 0.8 — strong positive correlation. Students who rank well in math tend to rank well in physics.
When to Use It
Use Spearman rank correlation when:
- Your data is measured on an ordinal scale (e.g., satisfaction ratings, competition rankings)
- The relationship between variables is monotonic but not necessarily linear
- Your data contains significant outliers that would distort Pearson correlation
- One or both variables are not normally distributed
- You are analyzing paired rankings (e.g., judges ranking the same set of items)
Note: When there are many tied ranks, the formula above gives slightly inaccurate results. In that case, use the general formula based on Pearson correlation of the ranked data, which handles ties correctly. Statistical software packages handle ties automatically.