- Total mark ______/25
- Readability/citation ______/5
- Reproducibility ______/5
- Answers ______/15

- This question explores bias-variance trade-off. Read in the simulated data
`possum_magic.rda`

. This data is generated using the following function:

\[ y = 2x + 10sin(x) + \varepsilon, ~~\text{where}~~x\in [-10, 20], ~~\varepsilon\sim N(0, 4^2)\]

- (1)Make a plot of the data, overlaying the true model.

- (1)Break the data into a \(2/3\) training and a \(1/3\) test set. (Hint: You can use the function
`createDataPartition`

from the`caret`

package.) Fit a linear model, using the training set. Compute the training MSE and test MSE. Overlay the linear model fit on a plot of the data and true model.

- Now examine the behaviour of the training and test MSE, for a
`loess`

fit.- (1)Look up the
`loess`

model fit, and write a paragraph explaining how this fitting procedure works. In particular, explain what the`span`

argument does.`loess`

fits a polynomial model on subsets of the data. The subsets are produced using a sliding window across the`x`

variable. Within each window, the model is fitted. The predicted values are combined from all of the fits, weighted by distance from the centre of the window, and aggregated to produce a fitted value at each`x`

. By default, a quadratic polynomial is used. (1)Compute the training and test MSE for a range of

`span`

values, 0.5, 0.3, 0.2, 0.1, 0.05, 0.01. Plot the training and test MSE against the span parameter. (For each model, also make a plot of the data and fitted model, just for yourself, but not to hand in.)

- (1)Look up the