Required work for Chapter 2:
Common confusions:
Question 10 refers to “suburbs” of Boston, but it is not clear that the data points refer to suburbs. Redefine “suburb” to be “data point”. Answer questions such as “which suburb” by “row 50 in the dataset”, or like this:
boston.iloc[50,:]
crim 0.08873
zn 21.00000
indus 5.64000
chas 0.00000
[...]
Most of the data is on the ISL web site. Here are direct links:
Example of how to load the data:
df = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Auto.csv',
na_values = '?', sep=",")
Common parameters:
na_values
contains a string used to represent missing
data. Only needed if data is not in a usual format (sorry, that’s
the Auto dataset).sep
could be "\t"
if the columns are separated
by tabs instead of spaces.These links do not go to GitHub display pages, they go straight to the raw source code. They are also automatically generated so they will always be up to date.
2018-09-19 Directly downloadable links fixed. 2018-09-29 Cleanup with a focus on getting loaded and running.