Chapter 1 · In R

Let R do the arithmetic

You did it by hand to understand it. In practice nobody computes Sxy with a calculator — R fits the whole model in one line, and hands back the exact same numbers.

The lm() function as a machine: data in, line out

Never used R? Set it up in 2 minutes →

terminal

One function: lm()

lm stands for linear model. You give it a formula in the form response ~ predictor (read "y explained by x") and your data, and it returns the fitted model — slope, intercept and all.

R
# the same six teams from the worked example mlb <- data.frame( hr = c(245, 221, 198, 214, 177, 162), runs = c(807, 758, 726, 749, 691, 668) ) # fit runs explained by home runs — that's the whole calculation fit <- lm(runs ~ hr, data = mlb)

That single lm() call does every step you ground out by hand: means, Sxx, Sxy, slope, intercept.

data_object

Read it with summary()

A console window showing model output
R console
> summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 400.73 11.45 35.0 3.9e-06 *** hr 1.639 0.056 29.3 8.1e-06 *** Residual standard error: 3.79 on 4 degrees of freedom Multiple R-squared: 0.9954

Every number you computed by hand is right there:

In the outputValueThat's our…
(Intercept)400.73β̂₀ — intercept (Step 4)
hr1.639β̂₁ — slope (Step 4)
Residual standard error3.79s (Step 6)
Multiple R-squared0.9954R² (Step 7)
Same answers, zero arithmetic R reproduces the hand-worked line ŷ = 400.73 + 1.639x exactly — the point of doing it by hand was to know what each of these numbers means.
checklist

The whole cheat-sheet

Test yourself · R

Four quick checks

Read the output below and answer. Type-and-check, same as before.

A console with a green check
0 / 4 solved
# a different model: runs allowed explained by walks allowed > summary(lm(runs ~ walks, data = nl_east)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.07 ... walks 1.256 ... Multiple R-squared: 0.998
aWhich R function fits a simple linear model?1 mark

Two letters. It stands for "linear model".

function
> fit <- lm(runs ~ walks, data = nl_east)

lm() — linear model. The formula is response ~ predictor.

bWrite the formula to model runs allowed from walks.1 mark

Form: response ~ predictor. The columns are runs and walks.

formula
lm(runs ~ walks, data = nl_east)

Response on the left of ~, predictor on the right. (y ~ x is the general form.)

cFrom the output above, what is the slope (β̂₁)?1 mark

It's the Estimate on the walks row.

β̂₁
walks 1.256 ...

Each extra walk allowed → ≈ 1.256 more runs allowed, on average.

dFrom the output, what is R²?1 mark

Look for "Multiple R-squared".

Multiple R-squared: 0.998

Walks explain 99.8% of the variation in runs allowed.

terminal

4 / 4 — fluent in lm()!

By hand for understanding, in R for speed. That's the full toolkit for simple linear regression.

? Back to Test Yourself Back to the primer ↺