Alcohol Analysis Lab/Quiz

You are allowed to use any individual resources you have, including but not limited to your old homework solutions, Stack Overflow, and the class web site. You may not consult another person (except the teacher). On the first reading, please skip questions you find difficult.

Setup Code

In [ ]:
# !pip install seaborn==0.9.0
In [1]:
import numpy as np
import pandas as pd
import scipy
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.simplefilter('ignore',FutureWarning)
In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv')

What are the columns used in the data set?

Make a new data frame whose columns are 'country', 'beer', 'spirit', 'wine', and 'total'.

Plot the distribution of beer, wine, and spirits servings consumed in various countries.

Make a chart showing the relationships between beer, wine, and spirits (only).

Make a scatterplot showing the relationship between wine and spirits, coloring each dot using the amount of beer consumed.

How many countries consume more than 10 liters of pure alcohol per year?

How many claim zero consumption?

Make a new data frame that eliminates all of the zero consumption countries from the original data.

From here on, consider only the "non-dry" countries.

Among countries that consume alcohol, what is the median number of servings of beer consumed?

What are the top five countries in total alcohol consumption?

Produce a table showing how beer, wine, and spirit consumption is correlated (correlation coefficients).

Bonus: make a heat map showing the intensity of these relationships in a table

Among alcohol consuming countries, what are the mean amounts of each category?

Among alcohol consuming countries, what are the standard deviations of each category?

How many countries drink more than 1 standard deviation above the mean of beer?

Make a box plot showing the distribution of the beer data (among alcohol-consuming countries).

Analyze the distribution of wine consumption in the top quartile of beer drinking alcohol-consuming countries....

Using only the top quartile of beer-drinking countries, plot the distribution of wine consumption. The top quartile means above the 75% of the way through the sorted data list. You can find the cutofffor the top quartile by using yourData.quantile(0.75) from Pandas. Quantile documentation.

Of countries whose names begin with K through Q, what are the top five wine consumers?

Make a scatterplot of beer vs spirit consumption for the countries beginning with 'K' through 'Q'.

Plotting error...

Per some article, estimate the alcohol contents of these drinks as:

  • beer 5%
  • wine 12%
  • spirits 40%

Use these numbers to estimate the total liters of alcohol consumed. Since these are not the exact numbers used in the data set, you will not get the totals used. Make a plot of the distribution of the errors (your total alcohol estimate minus number used in table for total).

Modify the original data set so that all of the beer servings more than 300 are set to 300.

Then find the mean number of servings of beer in the modified dataset.