In R — Fitting the line with lm()

terminal

One function: lm()

lm stands for linear model. You give it a formula in the form response ~ predictor (read "y explained by x") and your data, and it returns the fitted model — slope, intercept and all.

# the same six teams from the worked example mlb <- data.frame( hr = c(245, 221, 198, 214, 177, 162), runs = c(807, 758, 726, 749, 691, 668) ) # fit runs explained by home runs — that's the whole calculation fit <- lm(runs ~ hr, data = mlb)

That single lm() call does every step you ground out by hand: means, Sxx, Sxy, slope, intercept.

data_object

Read it with summary()

R console

> summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 400.73 11.45 35.0 3.9e-06 *** hr 1.639 0.056 29.3 8.1e-06 *** Residual standard error: 3.79 on 4 degrees of freedom Multiple R-squared: 0.9954

Every number you computed by hand is right there:

In the output	Value	That's our…
(Intercept)	400.73	β̂₀ — intercept (Step 4)
hr	1.639	β̂₁ — slope (Step 4)
Residual standard error	3.79	s (Step 6)
Multiple R-squared	0.9954	R² (Step 7)

Same answers, zero arithmetic R reproduces the hand-worked line ŷ = 400.73 + 1.639x exactly — the point of doing it by hand was to know what each of these numbers means.

checklist

The whole cheat-sheet

lm(y ~ x, data = d) — fit the line
summary(fit) — coefficients, R², p-values, residual s
coef(fit) — just β̂₀ and β̂₁
predict(fit, newdata) — predict y for new x values
confint(fit) — 95% confidence intervals for the coefficients
plot(runs ~ hr, data = d); abline(fit) — scatter with the fitted line

Test yourself · R

Four quick checks

Read the output below and answer. Type-and-check, same as before.

0 / 4 solved

# a different model: runs allowed explained by walks allowed > summary(lm(runs ~ walks, data = nl_east)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 50.07 ... walks 1.256 ... Multiple R-squared: 0.998

aWhich R function fits a simple linear model?1 mark

Two letters. It stands for "linear model".

function

> fit <- lm(runs ~ walks, data = nl_east)

lm() — linear model. The formula is response ~ predictor.

bWrite the formula to model runs allowed from walks.1 mark

Form: response ~ predictor. The columns are runs and walks.

formula

lm(runs ~ walks, data = nl_east)

Response on the left of ~, predictor on the right. (y ~ x is the general form.)

cFrom the output above, what is the slope (β̂₁)?1 mark

It's the Estimate on the walks row.

β̂₁

walks 1.256 ...

Each extra walk allowed → ≈ 1.256 more runs allowed, on average.

dFrom the output, what is R²?1 mark

Look for "Multiple R-squared".

R²

Multiple R-squared: 0.998

Walks explain 99.8% of the variation in runs allowed.

terminal

4 / 4 — fluent in lm()!

By hand for understanding, in R for speed. That's the full toolkit for simple linear regression.

? Back to Test Yourself Back to the primer ↺

Let R do the arithmetic

One function: lm()

Read it with summary()

The whole cheat-sheet

Four quick checks

4 / 4 — fluent in lm()!