8.a. smf.ols
is how to do a regression
8.a.i relationship means the coefficient
8.a.iv this would be get_prediction
and its summary_frame
8b. sns.lmplot
is the way to get the points and a regression line
8c. Diagnostic plots means the plot_lm_summary
function that i
wrote. Follow the instructions on
wy-cs.site
.
sns.pairplot
corr()
function that works on data framessmf.ols
… you specify all of the variables you want to consider by listing them in the formula “x1+x2+stuff”. Statistically significant response means looking at results.pvalues
.plot_lm_summary
again. The “*” creates a new variable by
multiplying. The “:” also includes the individual terms
automatically. You can include “log(x1)” in your formula. You may need
to do np.power(horsepower,2)
if you want to consider what happens
when a single predictor is squared. See the StatsModels formula
page for more information.10f. Comparing fit with different models could be done with the material from page 79ff.
10g. Outliers and high leverage points are visible on the summary plots.
Opening paragraph:
numpy.random.seed(N)
.scipy.stats.norm()
.whatever.rvs(size=50)
scale=
parameter to the normal distribution.11a. To perform a regression without an intercept, use the formula the
text suggests: “y ~ x + 0”. You can get $t$-values and $p$-values from
the instance variables tvalues
and pvalues
if you want to see them
individually.
See number 11 for the functions needed.
13f. The population regression line could be drawn by creating a
data frame with only two points in it (both lying on the desired
line). Use sns.lineplot
in the same cell as the scatterplot to draw
the line.
Skip the legend if it does not appear automatically. Example code
trueLine = pd.DataFrame({'x': np.array([-3, 3]),
'y': np.array([-10, 10])})
See question 11 for a primer on simulated data.
14g. To add the bad data based on the original dataframe df
:
badData = pd.DataFrame({'x1': 0.1, 'x2': 0.8, 'y': 6},
index=[len(df)])
df2 = df.append(badData)
This question needs to be unpacked. Use the plot_lm_summary
function
that was provided in class to analyze leverage.