ISL Chapter 2

Required work for Chapter 2:

  • Conceptual questions 1,2,3,5,6,7
  • Practical exercises: 9, 10.
  • Question 10: don’t get stuck in part c - all of the other parts are more accessible.

Common confusions:

  • Question 10: how do I get the data?
  • Question 10 refers to “suburbs” of Boston, but it is not clear that the data points refer to suburbs. Redefine “suburb” to be “data point”. Answer questions such as “which suburb” by “row 50 in the dataset”, or like this:

    boston.iloc[50,:]
    
    crim         0.08873
    zn          21.00000
    indus        5.64000
    chas         0.00000
    [...]

Getting the data

Most of the data is on the ISL web site. Here are direct links:

Example of how to load the data:

df = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Auto.csv', 
                 na_values = '?', sep=",")

Common parameters:

  • na_values contains a string used to represent missing data. Only needed if data is not in a usual format (sorry, that’s the Auto dataset).
  • sep could be "\t" if the columns are separated by tabs instead of spaces.

Directly Downloadable Content

These links do not go to GitHub display pages, they go straight to the raw source code. They are also automatically generated so they will always be up to date.

  • lab-02-03.ipynb (14 kB)
  • Auto.data (20 kB)
  • Boston.data (36 kB)
  • College.csv (73 kB)
  • hitters.tsv (27 kB)
  • Changelog

    2018-09-19 Directly downloadable links fixed. 2018-09-29 Cleanup with a focus on getting loaded and running.