We're going to answer that with one straight line — built by hand, from six baseball teams, one number at a time. No jargon left floating. First, the short primer so the example actually makes sense.
Five ideas. Each one is a piece of the puzzle we solve on the next page.

A home run is when a batter hits the ball out of the park in one swing. It's a clean counting stat — no opinions, no formulas, just a tally for the season. This is our predictor: the thing we already know and use as the input.

Runs are how you actually win — every time a player makes it all the way around to home plate, that's one run. We care about total runs scored across the season. This is the outcome we want to explain and predict.

Our dataset is six MLB teams — the Yankees, Dodgers, Red Sox, Astros, Cubs and Padres — each with two numbers: how many home runs they hit, and how many runs they scored. Six teams = six dots on a graph. The sample is intentionally small and clean so the mechanics stay visible.

Common sense says teams that hit more home runs probably score more runs. But how much more — and how reliable is the pattern? This is the Moneyball move: replace a gut feeling with a number you can actually trust.

One straight line through the dots — the line of best fit: ŷ = β̂₀ + β̂₁x. You'll find its slope and intercept by hand, then measure how well it fits with R². That's the entire worked example, eight small steps.
| Team | Home runs (x) | Runs scored (y) |
|---|---|---|
| Yankees | 245 | 807 |
| Dodgers | 221 | 758 |
| Red Sox | 198 | 726 |
| Astros | 214 | 749 |
| Cubs | 177 | 691 |
| Padres | 162 | 668 |
Just two columns and six rows for learning the method. Real projects should use more data before making decisions.
We'll go from raw numbers to a fitted model and R² — one step at a time.
Open the worked example arrow_forward