Low Stakes Stats · Chapter 1

Can home runs
predict runs?

We're going to answer that with one straight line — built by hand, from six baseball teams, one number at a time. No jargon left floating. First, the short primer so the example actually makes sense.

Start the primer arrow_downward Skip to the example
A student and coach studying a sports stats dashboard
The short primer

Everything you need, before any maths

Five ideas. Each one is a piece of the puzzle we solve on the next page.

A batter hitting a home run over the fence
sports_baseball

Home runs

x

A home run is when a batter hits the ball out of the park in one swing. It's a clean counting stat — no opinions, no formulas, just a tally for the season. This is our predictor: the thing we already know and use as the input.

A runner crossing home plate beside a scoreboard
scoreboard

Runs scored

y

Runs are how you actually win — every time a player makes it all the way around to home plate, that's one run. We care about total runs scored across the season. This is the outcome we want to explain and predict.

A row of six team pennants
groups

Six teams, one teaching sample

Our dataset is six MLB teams — the Yankees, Dodgers, Red Sox, Astros, Cubs and Padres — each with two numbers: how many home runs they hit, and how many runs they scored. Six teams = six dots on a graph. The sample is intentionally small and clean so the mechanics stay visible.

A scatter of dots with a big question mark
live_help

The hunch

Common sense says teams that hit more home runs probably score more runs. But how much more — and how reliable is the pattern? This is the Moneyball move: replace a gut feeling with a number you can actually trust.

A scatter of dots with a best-fit line through them
trending_up

What you'll solve for

One straight line through the dots — the line of best fit: ŷ = β̂₀ + β̂₁x. You'll find its slope and intercept by hand, then measure how well it fits with . That's the entire worked example, eight small steps.

table_chart

The data you'll work with

TeamHome runs (x)Runs scored (y)
Yankees245807
Dodgers221758
Red Sox198726
Astros214749
Cubs177691
Padres162668

Just two columns and six rows for learning the method. Real projects should use more data before making decisions.

Got all five? Let's build the line.

We'll go from raw numbers to a fitted model and R² — one step at a time.

Open the worked example arrow_forward