Runs don't just come from home runs — walks put runners on base too. Multiple linear regression lets you use several predictors at once. Same idea as Chapter 1, one step up. First, the short primer.

Five ideas — most of them you already know from Chapter 1.

Now we use two inputs: home runs (x?) and walks (x₂). Both are numeric, both plausibly drive scoring. MLR handles any number of predictors — we'll use two to keep the arithmetic readable.

The response is still a single number — runs scored (y). We're asking how runs depend on home runs and walks together.

One predictor gave a line. Two predictors give a tilted plane: y = β₀ + β₁x? + β₂x₂ + ε. Each β is the effect of its own variable.

This is the one genuinely new idea. β₁ is the effect of home runs while walks stay fixed — the extra scoring from a home run that walks can't already explain. Every coefficient is read this way.

The coefficients come from one matrix formula, β̂ = (XᵀX)⁻¹Xᵀy — the grown-up version of Chapter 1's Sxy/Sxx. Then you judge the model with adjusted R² and an F-test.
| Team | Home runs (x?) | Walks (x₂) | Runs (y) |
|---|---|---|---|
| Yankees | 245 | 520 | 690 |
| Dodgers | 221 | 540 | 696 |
| Red Sox | 198 | 505 | 652 |
| Astros | 214 | 560 | 706 |
| Cubs | 177 | 480 | 623 |
| Padres | 162 | 500 | 632 |
Three columns now: two predictors and one outcome. This teaching sample is small on purpose; the workflow is what transfers.
We'll set up the matrices, read the coefficients, and judge the model — step by step.
Open the worked example arrow_forward