Class discussion

Textbook question, chapter 7 Q4

This is the full function, extending beyond -2, 2.

Do it yourself

Following the textbook lab exercise for Chapter 7. Read through pages 288-297, BUT use the code below.

  1. Explore the polynomial model fitting
  1. This builds from the polynomial model fit for the Wage data, using variables wage and age, in Figure 7.1.

The function poly is a convenient way to generate a fourth-degree polynomial. By default it uses “orthonormal polynomials”. Look up what an orthonomal polynomial is, on the internet.

library(tidyverse)
library(ISLR)
fit <- lm(wage~poly(age,4), data=Wage)
coef(summary(fit))
#                 Estimate Std. Error    t value     Pr(>|t|)
# (Intercept)    111.70361  0.7287409 153.283015 0.000000e+00
# poly(age, 4)1  447.06785 39.9147851  11.200558 1.484604e-28
# poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32
# poly(age, 4)3  125.52169 39.9147851   3.144742 1.678622e-03
# poly(age, 4)4  -77.91118 39.9147851  -1.951938 5.103865e-02

We can request that “raw” polynomials are generated instead, with the raw=TRUE argument.

fit2 <- lm(wage~poly(age,4, raw=TRUE), data=Wage)
coef(summary(fit2))
#                                Estimate   Std. Error   t value
# (Intercept)               -1.841542e+02 6.004038e+01 -3.067172
# poly(age, 4, raw = TRUE)1  2.124552e+01 5.886748e+00  3.609042
# poly(age, 4, raw = TRUE)2 -5.638593e-01 2.061083e-01 -2.735743
# poly(age, 4, raw = TRUE)3  6.810688e-03 3.065931e-03  2.221409
# poly(age, 4, raw = TRUE)4 -3.203830e-05 1.641359e-05 -1.951938
#                               Pr(>|t|)
# (Intercept)               0.0021802539
# poly(age, 4, raw = TRUE)1 0.0003123618
# poly(age, 4, raw = TRUE)2 0.0062606446
# poly(age, 4, raw = TRUE)3 0.0263977518
# poly(age, 4, raw = TRUE)4 0.0510386498

The coefficients are different, but effectively the fit is the same, which can be seen by plotting the fitted values from the two models.

wage_fit <- Wage %>% 
  mutate(yhat1 = predict(fit, Wage), yhat2 = predict(fit2, Wage))
ggplot(wage_fit, aes(x=yhat1, y=yhat2)) + geom_point() + theme(aspect.ratio = 1)

To examine the differences between orthonormal polynomials and “raw” polynomials, we can make scatterplot matrices of the two sets of polynomials.

library(GGally)
p_orth <- as_tibble(poly(Wage$age, 4))
ggscatmat(p_orth)

p_raw <- as_tibble(poly(Wage$age, 4, raw=TRUE))
ggscatmat(p_raw)