In R — Fitting MLR with lm()

terminal

Add a predictor with +

The formula y ~ x1 + x2 reads "y explained by x1 and x2". Everything else is identical to SLR.

# runs explained by home runs AND walks mlb <- data.frame( hr = c(245, 221, 198, 214, 177, 162), walks = c(520, 540, 505, 560, 480, 500), runs = c(690, 696, 652, 706, 623, 632) ) fit <- lm(runs ~ hr + walks, data = mlb)

data_object

Read it with summary()

R console

> summary(fit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 139.676 13.782 10.13 0.00205 ** hr 0.508 0.031 16.42 0.00049 *** walks 0.819 0.032 25.26 0.00014 *** Residual standard error: 1.64 on 3 degrees of freedom Multiple R-squared: 0.9987, Adjusted R-squared: 0.9979 F-statistic: 1165 on 2 and 3 DF, p-value: 6.1e-05

Everything from the worked example is in this one block:

In the output	Value	Meaning
hr	0.508	+0.51 runs per HR, holding walks fixed
walks	0.819	+0.82 runs per walk, holding HR fixed
Adjusted R-squared	0.9979	fit, penalised for 2 predictors
F-statistic	1165	whole model is significant (p ≈ 0)

Read the adjusted oneWith more than one predictor, quote Adjusted R-squared when comparing models — it's the honest number.

checklist

The MLR cheat-sheet

lm(y ~ x1 + x2, data = d) — fit with two predictors (add more with +)
summary(fit) — coefficients, adjusted R², F-statistic
confint(fit) — 95% intervals for each coefficient
predict(fit, newdata) — predict y for new x?, x₂
vif(fit) — check multicollinearity (predictors too alike?) — from the car package
anova(fit1, fit2) — does an extra predictor actually help?

Test yourself · R

Four quick checks

Based on the output above. Type-and-check.

0 / 4 solved

aWrite the formula to model runs from hr and walks.1 mark

Add predictors with +.

formula

lm(runs ~ hr + walks, data = mlb)

bFrom the output, what is the adjusted R²?1 mark

adj R²

Adjusted R-squared: 0.9979

cWhich function checks whether predictors are too alike (multicollinearity)?1 mark

Three letters — "variance inflation factor".

function

vif(fit) # from the car package

dFrom the output, what is the coefficient on walks?1 mark

The Estimate on the walks row.

β̂ walks

walks 0.819

terminal

4 / 4 — lm() scales up.

One predictor or ten, it's the same call with more +s. The hard part is interpretation — which you've now done both ways.

? Test Yourself Practice it →

More predictors, same one line

Add a predictor with +

Read it with summary()

The MLR cheat-sheet

Four quick checks

4 / 4 — lm() scales up.