These exercises are from the texbook.

- Work your way through the 6.5.1

Which variables make the best 8 variable model?

Force it to examine all possible (19) predictors.

Plot the model fit diagnostics for the best model of each size.

What would these diagnostics suggest about an appropriate choice of models? Do your results compare with the text book results? Why not?

Fit forward stepwise selection. How would the decision about best model change?

Does the model change with backward stepwise selection?

- Now repeat the process with a training and test split, to use the test set to help decide on on the best model.

- Break the data into a 2/3 training and 1/3 test set.
- Fit the best subsets. Compute the mean square error for the test set. Which model would it suggest? Is the subset of models similar to produced on the full data set? Do your results compare with the text book results? Why not?

- Try again with cross-validation.

- It is said that 10-fold cross-validation is a reasonable choice for dividing the data. What size data sets would this create for this data? Argue whether this is good or bad. With your selection of an appropriate \(k\) conduct the cross-validation.
- The book talks about a “model matrix”. What is this? Why does the code for cross-validation need to use this?
- Plot the test error against the size of model, coloured by the CV fold. Why does the test error vary by fold? What does the variation mean? What size model is suggested?
- How do your results compare with the textbook analysis? Can you explain any discrepancies?

- Now we are going to examine regularisation, using lasso.

- Using your results from questions 1-3, fit the best least squares model, to your training set. Write down the mean square error and estimates for the final model. We’ll use these to compare with the lasso fit.
- Fit the lasso to a range of \(\lambda\) values. Plot the standardised coefficients against \(\lambda\). What does this suggest about the predictors?
- Conduct a cross-validation
- Fit the final model using the best \(\lambda\). What are the estimated coefficients? What predictors contribute to the model?
- Does the best lasso model beat the best least squares model (best subsets)?