Class discussion

Do it yourself

This is basically replicating the plots made in the class notes.

  1. Compute the means, standard deviations and correlation for the datasaurus dozen, and check that they are indeed all the same.
  2. Run a 2D projection grand tour for
    1. the 6D flea data
    2. the 7D womens track data
  3. Run a 2D guided tour
    1. using the holes index for the 6D flea data
    2. using the lda_pp index for the 6D flea data, using species as the class
    3. using the cmass index for the 7D womens track data
  4. Make a parallel coordinate plot for 6D flea data, coloured by species, with
    1. axes ordered by any class
    2. axes ordered by all classes

Practice

This exercise uses the chocolates data.

About the data: The chocolates data was compiled by students in a previous class of Prof Cook, by collecting nutrition information on the chocolates as listed on their internet sites. All numbers were normalised to be equivalent to a 100g serving. Units of measurement are listed in the variable name.

  1. Use the tour, with type of chocolate mapped to colour, and write a paragraph on whether the two types of chocolate differ on the nutritional variables.

  2. Make a parallel coordinate plot of the chocolates, coloured by type, with the variables sorted by how well they separate the groups. Maybe the “uniminmax” scaling might work best for this data. Write a paragraph explaining how the types of chocolates differ in nutritional characteristics.

  3. Identify one dark chocolate that is masquerading as dark, that is, nutritionally looks more like a milk chocolate. Explain your answer.