Textbook question, chapter 4 Q8

*Logistic regression: 20% training error rate, 30% test error rate KNN(K=1): average error rate of 18%*

*For KNN with K=1, the training error rate is 0% because for any training observation, its nearest neighbor will be the response itself. So, KNN has a test error rate of 36%. I would choose logistic regression because of its lower test error rate of 30%.*

- Run the K-Nearest Neighbours classification example, in the textbook section 4.6.5. The code below, fits the model for \(k=1\).

```
library(tidyverse)
library(ISLR)
library(class)
data(Smarket)
Smarket_tr <- Smarket %>%
dplyr::filter(Year < 2005) %>%
dplyr::select(Lag1, Lag2, Direction)
Smarket_ts <- Smarket %>%
dplyr::filter(Year >= 2005) %>%
dplyr::select(Lag1, Lag2, Direction)
knn.pred <- knn(Smarket_tr[,1:2], Smarket_ts[,1:2],
Smarket_tr[,3], k=1)
table(knn.pred, Smarket_ts[,3])
#
# knn.pred Down Up
# Down 43 58
# Up 68 83
```

- Compute the test error for \(k=1\)
- Re-fit the model for \(k=3\), and compute the test error. How does this compare with the smaller \(k\)?
- Fit a range of values for \(k\), and find the best value.
- Would you put your money on this classification model, to invest in stock purchases?

- Run the linear discriminant analysis for the chocolates data from the lecture notes, and compute the training and test error.

- This data is an oldy, but a goody, and contains physical measurements on three species of flea beetles. You can find it at http://www.ggobi.org/book/data/flea.csv.

*Source:* Lubischew, A. A. (1962), On the Use of Discriminant Functions in Taxonomy, Biometrics 18, 455â€“477.

Variable | Explanation |
---|---|

species | Ch. concinna, Ch. heptapotamica, and Ch. heikertingeri |

tars1 | width of the first joint of the first tarsus in microns |

tars2 | width of the second joint of the first tarsus in microns |

head | the maximal width of the head between the external edges of the eyes in 0.01 mm |

aede1 | the maximal width of the aedeagus in the fore-part in microns |

aede2 | the front angle of the aedeagus (1 unit = 7.5 degrees) |

aede3 | the aedeagus width from the side in microns |

- Read in the data, and make a scatterplot matrix, with the points coloured by species. Write a few sentences explaining what you learn about the data, and which variables seem to be most promising for distinguishing the species.

```
library(MASS)
library(caret)
library(GGally)
flea <- read_csv("http://www.ggobi.org/book/data/flea.csv")
ggscatmat(flea, column=2:7, color="species") +
scale_colour_brewer(palette="Dark2")
```