Linear Regression Formula
Calculate the line of best fit with y = mx + b.
Predict outcomes using linear regression and understand the slope and intercept.
The Formula
where m = Σ((x - x̄)(y - ȳ)) / Σ(x - x̄)²
and b = ȳ - m × x̄
Linear regression finds the straight line that best fits a set of data points. The slope (m) shows how much y changes for each unit increase in x. The intercept (b) is where the line crosses the y-axis.
Variables
| Symbol | Meaning |
|---|---|
| y | Predicted value (dependent variable) |
| x | Input value (independent variable) |
| m | Slope of the line (rate of change) |
| b | Y-intercept (value of y when x = 0) |
| x̄, ȳ | Means of x and y data |
Example 1
Advertising spend (x, in $1000s): 1, 2, 3, 4, 5 — Sales (y, in $1000s): 2, 4, 5, 4, 5
Step 1: x̄ = 3, ȳ = 4
Step 2: Calculate Σ((x-x̄)(y-ȳ)):
(1-3)(2-4) = 4, (2-3)(4-4) = 0, (3-3)(5-4) = 0, (4-3)(4-4) = 0, (5-3)(5-4) = 2
Σ((x-x̄)(y-ȳ)) = 4 + 0 + 0 + 0 + 2 = 6
Step 3: Σ(x-x̄)² = 4 + 1 + 0 + 1 + 4 = 10
Step 4: m = 6 / 10 = 0.6
Step 5: b = 4 - (0.6 × 3) = 4 - 1.8 = 2.2
y = 0.6x + 2.2 — For every $1,000 more in advertising, sales increase by $600.
Example 2
Study hours (x): 2, 4, 6, 8 — Exam score (y): 65, 75, 85, 90
Step 1: x̄ = 5, ȳ = 78.75
Step 2: (x-x̄)(y-ȳ):
(2-5)(65-78.75) = 41.25, (4-5)(75-78.75) = 3.75, (6-5)(85-78.75) = 6.25, (8-5)(90-78.75) = 33.75
Σ((x-x̄)(y-ȳ)) = 41.25 + 3.75 + 6.25 + 33.75 = 85
Step 3: Σ(x-x̄)² = 9 + 1 + 1 + 9 = 20
Step 4: m = 85 / 20 = 4.25
Step 5: b = 78.75 - (4.25 × 5) = 78.75 - 21.25 = 57.5
y = 4.25x + 57.5 — Each additional hour of study adds about 4.25 points. Predicted score for 7 hours: 87.25.
When to Use It
Use linear regression when:
- You want to predict one variable based on another
- Identifying trends in data (sales over time, performance vs. effort)
- Estimating the relationship between cause and effect
- Creating simple forecasting models for business or research