Textbook question, chapter 7 Q4

*This is the full function, extending beyond -2, 2.*

Following the textbook lab exercise for Chapter 7. Read through pages 288-297, *BUT* use the code below.

- Explore the polynomial model fitting

- This builds from the polynomial model fit for the Wage data, using variables wage and age, in Figure 7.1.

The function `poly`

is a convenient way to generate a fourth-degree polynomial. By default it uses “orthonormal polynomials”. Look up what an orthonomal polynomial is, on the internet.

```
library(tidyverse)
library(ISLR)
fit <- lm(wage~poly(age,4), data=Wage)
coef(summary(fit))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 111.70361 0.7287409 153.283015 0.000000e+00
# poly(age, 4)1 447.06785 39.9147851 11.200558 1.484604e-28
# poly(age, 4)2 -478.31581 39.9147851 -11.983424 2.355831e-32
# poly(age, 4)3 125.52169 39.9147851 3.144742 1.678622e-03
# poly(age, 4)4 -77.91118 39.9147851 -1.951938 5.103865e-02
```

We can request that “raw” polynomials are generated instead, with the `raw=TRUE`

argument.

```
fit2 <- lm(wage~poly(age,4, raw=TRUE), data=Wage)
coef(summary(fit2))
# Estimate Std. Error t value
# (Intercept) -1.841542e+02 6.004038e+01 -3.067172
# poly(age, 4, raw = TRUE)1 2.124552e+01 5.886748e+00 3.609042
# poly(age, 4, raw = TRUE)2 -5.638593e-01 2.061083e-01 -2.735743
# poly(age, 4, raw = TRUE)3 6.810688e-03 3.065931e-03 2.221409
# poly(age, 4, raw = TRUE)4 -3.203830e-05 1.641359e-05 -1.951938
# Pr(>|t|)
# (Intercept) 0.0021802539
# poly(age, 4, raw = TRUE)1 0.0003123618
# poly(age, 4, raw = TRUE)2 0.0062606446
# poly(age, 4, raw = TRUE)3 0.0263977518
# poly(age, 4, raw = TRUE)4 0.0510386498
```

The coefficients are different, but effectively the fit is the same, which can be seen by plotting the fitted values from the two models.

```
wage_fit <- Wage %>%
mutate(yhat1 = predict(fit, Wage), yhat2 = predict(fit2, Wage))
ggplot(wage_fit, aes(x=yhat1, y=yhat2)) + geom_point() + theme(aspect.ratio = 1)
```

To examine the differences between orthonormal polynomials and “raw” polynomials, we can make scatterplot matrices of the two sets of polynomials.

```
library(GGally)
p_orth <- as_tibble(poly(Wage$age, 4))
ggscatmat(p_orth)
```

```
p_raw <- as_tibble(poly(Wage$age, 4, raw=TRUE))
ggscatmat(p_raw)
```