Linear Regression Lab (3.6)¶

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This is a Python-based walkthough of the lab in Section 3.6 of ISL. You should absolutely read the text and consult this document for how to do the same things in Python.

There are several major topics in this lab:

Linear regression: fitting, prediction, confidence intervals.
Quality of fit: fitted values, residuals, graphical examination.
Multiple regression, includes R^2, VIF.
Regression with cross-terms and powers.
Minor but important idea: ANOVA to decide which set of predictors to use.
Regression with categorical / qualitative variables.

The biggest difference between the use of R and Python is that R gives you a quick graphical summary of every regression with just one command. I have not yet written a similar command for Python (but it could be done) (should I?).

# !pip install seaborn==0.9.0
import numpy as np
import pandas as pd
import scipy
import scipy.stats
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

import warnings
warnings.simplefilter('ignore',FutureWarning)

3.6.1 Load Libraries¶

import sklearn.datasets
boston_all = sklearn.datasets.load_boston()
boston = pd.DataFrame(boston_all.data, 
                      columns=['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
                               'ptratio', 'black', 'lstat'])
boston_independent = boston.columns
boston['medv'] = boston_all.target
boston_dependent = ['medv']

3.6.2a Simple Linear Regression¶

boston.columns

Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
       'ptratio', 'black', 'lstat', 'medv'],
      dtype='object')

The boston_all.DESCR field contains the description of columns.

#print(boston_all.DESCR) # commented out so I do not see all of the fields every time I run

lm_fit = smf.ols('medv ~ lstat', data=boston).fit()

In the book they write about "attaching" a data set. As far as I know there is no parallel in Python.

lm_fit.summary()

Notice that the $P>|t|$ column is not in scientific notation,, so $10^{-16}$ renders as $0.000$. (Verify...)

The directory of all possible functions for a fit object is really huge, so I'm not printing it out. The ones with underscores at the start are "internal".

#dir(lm_fit)

lm_fit.params

Intercept    34.553841
lstat        -0.950049
dtype: float64

Confidence intervals for the predictions are available with the conf_int command. This gives 95% confidence intervals (2.5% on each side, symmetric).

lm_fit.conf_int()

x_for_predict = np.array([5,10,15])
x_df = pd.DataFrame(x_for_predict, columns=["lstat"]);
lm_fit.predict(x_df)

0    29.803594
1    25.053347
2    20.303101
dtype: float64

If you want all of the prediction information, get a prediction data frame. I believe that the mean_ci is confidence for the mean, obs_ci is the confidence for the prediction. (Please let me know if this information is wrong!)

predict_df = lm_fit.get_prediction(x_df)
predict_df.summary_frame()

Notes on statistics vocabulary¶

Population parameter is a is the "true value" of the underlying trait of the population. Example: average height is a population parameter. You give an estimate of the average height along with a confidence interval.

The term 95% confidence interval means that if the process were repeated with randomly sampled data, 95% of the resulting intervals would contain the population parameter.

I think this also means that it is 95% likely that the given interval contains the true value of the population parameter.

Warning: the confidence interval gives no information about the distribution of the data. Of course, because a lot of samples will shrink the confidence interval no matter what the distribution.

The prediction confidence interval (obs in the table above) gives the interval that is 95% likely to contain true value of the prediction. (E.g., based on the known errors for the linear regression formula that you used.)

Decent explanation, including tolerance interval where you are 95% confident that 99% of the population is within an interval.

Linear model plot - does the regression fitting for you.

sns.lmplot(x='lstat', y='medv', data=boston);

# Joint plot is a little fancier, shows the distributions of each axis as well.
# sns.jointplot(x='lstat', y='medv', kind='reg', data=boston);

The lab explains how to draw additional lines of any slope on a graph. If that's what you are doing, you should probably be working directly with Matplotlib instead of Seaborn. If you want different shaped markers, use the scatterplot option markers.

3.6.2b Statistical Summary Plots¶

These plots are probably copy-and-paste jobs. I will try to show the simplest way to get each (from statsmodels) as well as linking the more "beautiful" ones I found. There is a good reference for making R-like plots in Python - it shows where to find all of the data you need.8 8Random notes: Seaborn could use another level of abstraction so you can provide the results from your own regression to the residual-plotting function.

I. Residuals and predicted (or fitted) values¶

The model predicts a y value for each x value. The predicted y value is called the fitted value. There is one predicted y value for each x value in the data set. (Fitted value is $\hat{y}$.)

The difference between a point's y value and the y-value predicted for that point's given x is called the residual. (Residual is $y - \hat{y}$.)

Both of these are very commonly used. You should know how to get them out of the linear model (fittedvalues and resid).

Option 1: Full Seaborn¶

Seaborn will just do the regression and everything for you.

sns.residplot(x='lstat', y='medv', data=boston);

Option 2: Linear Model¶

Use fitted values and residuals from the linear model (created by ols at the very start of this document).

x_predict = lm_fit.fittedvalues
y_residuals = lm_fit.resid

fig, ax = plt.subplots()
fig = sns.scatterplot(lm_fit.fittedvalues, lm_fit.resid);
ax.set_ylabel("residuals");
ax.set_xlabel("fitted values");
ax.set_title("Residuals vs fitted values");

II Q-Q plot (Quantile-Quantile)¶

The idea behind a Q-Q plot is to show the quantiles of a normal distribution vs quantiles of the errors. I guess if the errors are normally distributed and the model is perfect, you see a straight line.

The Q-Q plot is supposed to show how close your data is to a normal distribution? (Or approximately - student t-dist?) This data below shows a terrible fit for the model given - percent lower status families is not linearly related to median value of the homes.

Best Option: StatsModels ProbPlot¶

Just use the StatsModels qqplot function.

sm.qqplot(y_residuals,line='q');

Scale-Location Plot¶

Just look this one up if you ever need to scan for "heteroskedacity".... or copy the code / plot from the end.

Leverage Plot¶

What is leverage?

The assumption in working with the data is that the errors at every observation are independent and have the same variance. However, when you make a linear regression through the data, the variance of the errors at each point are not the same.

Maybe we can do the math and maybe not.

Fact for now: $\hat{y} = Hx$ ($H$ is the "hat matrix" which gives the result of the linear regression.) The diagonal entries of the matrix, $h_{ii}$, tell how the variance of the error at observation i relates to the variance of the actual errors. Var(error at observation i) = (true variance of error)$\times (1-h_{ii})$.

Studentized residuals means compensate for the differing variance of the errors at each point by dividing by $\sigma \sqrt(1-h_{ii})$.

Influence plot = studentized residuals vs leverage. The influence plot is supposed to show you if the residuals are behaving as expected (say more...)... how much each point is influencing the resulting regression.

fig, ax = plt.subplots(figsize=(20,20))
fig = sm.graphics.influence_plot(lm_fit, ax=ax)
plt.show()

If you need to track down the high influence or high leverage points by row number, read the resource blog at the start; it will be technical to get the information you need.

3.6.3 Multiple Linear Regression¶

lm_multiple = smf.ols('medv ~ lstat + age', data=boston).fit()
lm_multiple.summary()

When you want to use all of the predictors in your model, I like two choices:

Separate the dependent ($X$) and independent ($y$) variables, and use the basic sm.ols method.
Make a formula by joining all of the variables you are interested in together into a string.

Info: To eliminiate a few of the variables, append subtractions to the formula, like "-dis-rad". The subtractions will override the additions in the formula.

multiple_formula_rhs = "+".join(boston_independent)
multiple_formula_rhs

'crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+black+lstat'

lm_multiple = smf.ols('medv ~ ' + multiple_formula_rhs, data=boston).fit()
lm_multiple.summary()

lm_multiple.rsquared

0.7406077428649428

The book mentions a sequence of error estimators that can be used to select a good number of explanatory variables to use.

AIC
BIC
Adjusted $R^2$
$C_p$: Not included in the OLS results.

We will learn more about them in Chapter 6.

print(lm_multiple.aic)
print(lm_multiple.bic)
print(lm_multiple.rsquared_adj)

3025.6767200074596
3084.848233377484
0.7337538824121872

Mean squared error of the residuals = sum of squares divided by "residual degrees of freedom". Note that this is not MSE from the text (where you always divide by the number of observations).

lm_multiple.mse_resid

22.52088675640218

The text mentions $VIF$, the variance inflation factor.

from statsmodels.stats.outliers_influence import variance_inflation_factor as vif

vif(boston[boston_independent].values,1)

2.8438903527570782

DANGER: Mistake mistake mistake!! Keep reading.

# WRONG WAY!! DO NOT COPY!!
vif_data = [vif(boston[boston_independent].dropna().values, idx) for idx in range(len(boston_independent))]
vif_df = pd.DataFrame(vif_data, index = boston_independent)
vif_df = vif_df.transpose()
# i did not have any luck changing to making columns 
vif_df

If you look you will see the numbers do not agree with the ones from the R package... I am assuming the method used is slightly different. The value for indus and nox are really different... better check on this.

Solution: The "design matrix" needs to include a constant term. This would have been done automatically if we were using a design matrix generator like the statsmodels.formula API (which uses the patsy package internally). Lots of gotchas!

TODO: Revise this section so you generate the formulas through the API.

bconstant = boston.copy()
bconstant['constant'] = 1
bc_indices = boston_independent.append(pd.Index(['constant']))
vif_data = [vif(bconstant[bc_indices].dropna().values, idx) for idx in range(len(bc_indices))]
vif_df = pd.DataFrame([vif_data], columns = bc_indices)
# Notice the list of lists as input so the list is understood as one row of an Nx1 matrix
vif_df

R has an "update" function that lets you modify the formula but keep the same other inputs to a regression. That does not seem like an important feature, and it does not make sense the way StatsModels works, so just run the regression again!

3.6.4 Interaction Terms¶

smf.ols('medv ~ lstat * age', data=boston).fit().summary()

3.6.5 Non-linear transformations of the predictors¶

StatsModels, you get the second power of lstat in a regression by writing I(lstat**2).

R-compatible behavior in StatsModels/patsy is unexpected and hence annoying. Patsy says it supports ** as the exponentiation operator, but it's complicated. Really you can (need to!) write almost exactly the same kind of formula as in R, including the I, so I(var**2). Except that in R you write I(var^2) ... ugh.

Do use I(var**power) for exponential.
Do not use ^ that is bitwise or.
You do not automatically get lower powers! (See below for polynomials.)

lm_fit2 = smf.ols('medv ~ 1 + lstat + I(lstat**2)', data=boston).fit()
lm_fit2.summary()

Analysis of variance performs statistical testing to quantify how much better the second model is than the first one. (Etc, works with more models.) You could spend quite a bit of time reading about how this works if you wanted to.

Analysis of variance technical note: give the models in order of increasing complexity. Not symmetric! The warnings you see are fine - where you see NaN in the table, R just leaves blanks.

lm_fit = smf.ols('medv ~ lstat', data=boston).fit()
sm.stats.anova_lm(lm_fit, lm_fit2)

/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)

Read the book - a probability of $10^{-28}$ means it is virtually certain that the quadratic model is better than the linear one.

There is no poly function in StatsModels / patsy, and you can not cheat by exponentiating (1 + var), so write one yourself.

def patsy_poly(var,n):
    """Return a string: var+I(var**2)+I(var**3)+...+I(var**n)."""
    if n == 0:
        return "1" # should be "1"?
    elif n==1:
        return var
    else:
        return  patsy_poly(var,n-1) + "+I({}**{})".format(var,n)

#import patsy
#patsy.dmatrix('~' + patsy_poly('lstat',3), boston)

smf.ols("medv ~ " + patsy_poly('lstat',5), data=boston).fit().summary()

Let's use anova to analyze a sequence of models where you keep adding powers of lstat, polynomials from degree 1 to 7.

manymodels = \
[ smf.ols("medv ~ " + patsy_poly('lstat',k), data=boston).fit()
  for k in range(8) ]
sm.stats.anova_lm(*manymodels)

/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/usr/local/miniconda3/envs/bigconda/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)

Notice that the likelihood that model 6 (which is a degree 6 polynomial) is not a significant improvement on the degree five polynomnial (chance of the results being better (if they are) by chance is 20%).

3.6.6 Qualitative Predictors¶

Car seats dataset documentation.

carseats = pd.read_csv('http://vincentarelbundock.github.io/Rdatasets/csv/ISLR/Carseats.csv');
carseats.drop(columns='Unnamed: 0', inplace=True);

carseats.head()

carseats_independent = set(carseats.columns) - set(['Sales'])
carseats_dot = '+'.join(carseats_independent)

carseats.drop(columns='Unnamed: 0').head()

smf.ols( 'Sales ~ ' + carseats_dot + " + Income:Advertising + Price:Age", data=carseats).fit().summary()

Categorical variables have only a few values. They are encoded as boolean variables. (See book.)

The contrasts from R are the variables like "ShelveLoc[T.Good]". I read this one as "ShelveLoc True (1) means Good".

3.6.7 Writing Functions¶

This guide is written on the assumption that you have some basic knowledge of programming in Python. That means you already know how to write functions.

Notes¶

Hat Matrix¶

It would make sense to me to keep the hat matrix after any linear regression.

leverage of each point ($h_{ii}$)
"externally studentized" meaning estimate standard deviation without the point in question (Wikipedia notes)

Data sources¶

Datasets distributed with R is a collection of data placed on github by a U of M professor. We will get our data from there from now on!

Appendix: R-style regression summary plots¶

Need to produce four plots:

Residuals vs Fitted values with lowess
QQ standardized residuals vs normal residuals with best fit (?) line
Sqrt(abs(standardized residual)) vs fitted values
Standardized residuals vs leverage w/ lowess? and cook's distance

This should mostly be a copy of Emre Can's GSOC project to make R graphs in Python.

def plot_add_label(ax, xlabel=None, ylabel=None, title=None):
    if ax == None: return ax
    if xlabel: ax.set_xlabel(xlabel);
    if ylabel: ax.set_ylabel(ylabel);
    if title: ax.set_title(title);
    return ax

def plot_add_lowess(x, y, ax, color='red'):
    if ax == None: raise RuntimeException('Lowess needs axes to work with!')
    smooth_data = sm.nonparametric.lowess(endog=y, exog=x)
    fig = sns.lineplot(x=smooth_data[:,0], y=smooth_data[:,1], color=color, ax=ax)
    return fig

def plot_1_resid_vs_fitted(lm_fit, 
                           ax = None, 
                           xlabel="fitted values",
                           ylabel="residuals",
                           title="Residuals vs fitted values"):
    if ax == None: ax = plt.gca()
    x = lm_fit.fittedvalues
    y = lm_fit.resid
    fig = sns.scatterplot(x=x, y=y, ax=ax);
    xrange = ax.get_xlim()
    zeros = np.zeros(2)
    sns.lineplot(xrange, zeros, color='gray', style = zeros, dashes=[[2,4]], legend=False, ax=ax);
    plot_add_lowess(x,y,ax)
    plot_add_label(ax, xlabel=xlabel, ylabel=ylabel, title=title)
    return fig

plot_1_resid_vs_fitted(lm_fit);

def plot_2_residuals_vs_normal (lm_fit, 
                                ax = None, 
                                xlabel="Theoretical Quantiles",
                                ylabel="Standardized Residuals",
                                title="Normal Q-Q"):
    y_residuals = lm_fit.get_influence().resid_studentized_internal
    if ax == None: ax = plt.gca()
    fig = sm.qqplot(y_residuals, line='q', ax=ax)
    plot_add_label(ax, xlabel=xlabel, ylabel=ylabel, title=title)
    return fig

plot_2_residuals_vs_normal(lm_fit);

def plot_3_std_resid_vs_fitted(lm_fit,
                               ax = None, 
                               xlabel="fitted values",
                               ylabel="$\sqrt{|{Standardized residuals}|}$",
                               title="Scale-Location"):
    if ax == None: ax = plt.gca()
    x = lm_fit.fittedvalues
    y = lm_fit.get_influence().resid_studentized_internal
    fig = sns.scatterplot(x=x, y=y, ax=ax)
    plot_add_lowess(x, y, ax)
    plot_add_label(ax, xlabel=xlabel, ylabel=ylabel, title=title)
    return fig

plot_3_std_resid_vs_fitted(lm_fit);

def plot_4_std_resid_vs_leverage(lm_fit,
                                 ax = None, 
                                 xlabel="Leverage",
                                 ylabel="Standardized residuals",
                                 title="Residuals vs Leverage"):
    if ax == None: ax = plt.gca()
    x = lm_fit.get_influence().hat_matrix_diag  # leverage
    y = lm_fit.get_influence().resid_studentized_internal # studentized residuals
    fig = sns.scatterplot(x=x, y=y, ax=ax)
    plot_add_lowess(x, y, ax)
    plot_add_label(ax, xlabel=xlabel, ylabel=ylabel, title=title)
    return fig

plot_4_std_resid_vs_leverage(lm_fit);

def plot_lm_summary(lm_fit,
                    title="Linear Model Summary (R-style)",
                    figsize=(15,10)):
    """Produce an R-style summary of a fitted linear model.
    The linear model must be the result of a StatsModel OLS."""
    fig, ax = plt.subplots(nrows=2,ncols=2, figsize=figsize)
    plot_1_resid_vs_fitted(lm_fit, ax=ax[0,0]);
    plot_2_residuals_vs_normal(lm_fit, ax=ax[0,1]);
    plot_3_std_resid_vs_fitted(lm_fit, ax=ax[1,0]);
    plot_4_std_resid_vs_leverage(lm_fit, ax=ax[1,1]);
    if title: fig.suptitle(title, fontsize=18)
    return fig

plot_lm_summary(lm_fit);

Dep. Variable:	medv	R-squared:	0.544
Model:	OLS	Adj. R-squared:	0.543
Method:	Least Squares	F-statistic:	601.6
Date:	Fri, 05 Oct 2018	Prob (F-statistic):	5.08e-88
Time:	20:01:07	Log-Likelihood:	-1641.5
No. Observations:	506	AIC:	3287.
Df Residuals:	504	BIC:	3295.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	34.5538	0.563	61.415	0.000	33.448	35.659
lstat	-0.9500	0.039	-24.528	0.000	-1.026	-0.874

Omnibus:	137.043	Durbin-Watson:	0.892
Prob(Omnibus):	0.000	Jarque-Bera (JB):	291.373
Skew:	1.453	Prob(JB):	5.36e-64
Kurtosis:	5.319	Cond. No.	29.7

	0	1
Intercept	33.448457	35.659225
lstat	-1.026148	-0.873951

	mean	mean_se	mean_ci_lower	mean_ci_upper	obs_ci_lower	obs_ci_upper
0	29.803594	0.405247	29.007412	30.599776	17.565675	42.041513
1	25.053347	0.294814	24.474132	25.632563	12.827626	37.279068
2	20.303101	0.290893	19.731588	20.874613	8.077742	32.528459

Omnibus:	124.288	Durbin-Watson:	0.945
Prob(Omnibus):	0.000	Jarque-Bera (JB):	244.026
Skew:	1.362	Prob(JB):	1.02e-53
Kurtosis:	5.038	Cond. No.	201.

Omnibus:	178.029	Durbin-Watson:	1.078
Prob(Omnibus):	0.000	Jarque-Bera (JB):	782.015
Skew:	1.521	Prob(JB):	1.54e-170
Kurtosis:	8.276	Cond. No.	1.51e+04

Omnibus:	135.601	Durbin-Watson:	0.965
Prob(Omnibus):	0.000	Jarque-Bera (JB):	296.955
Skew:	1.417	Prob(JB):	3.29e-65
Kurtosis:	5.461	Cond. No.	6.88e+03

Omnibus:	107.006	Durbin-Watson:	0.921
Prob(Omnibus):	0.000	Jarque-Bera (JB):	228.388
Skew:	1.128	Prob(JB):	2.55e-50
Kurtosis:	5.397	Cond. No.	1.13e+03

	df_resid	ssr	df_diff	ss_diff	F	Pr(>F)
0	504.0	19472.381418	0.0	NaN	NaN	NaN
1	503.0	15347.243158	1.0	4125.13826	135.199822	7.630116e-28

Omnibus:	144.085	Durbin-Watson:	0.987
Prob(Omnibus):	0.000	Jarque-Bera (JB):	494.545
Skew:	1.292	Prob(JB):	4.08e-108
Kurtosis:	7.096	Cond. No.	1.37e+08

	df_resid	ssr	df_diff	ss_diff	F	Pr(>F)
0	505.0	42716.295415	0.0	NaN	NaN	NaN
1	504.0	19472.381418	1.0	23243.913997	854.221354	1.427644e-110
2	503.0	15347.243158	1.0	4125.138260	151.600165	1.236308e-30
3	502.0	14615.481262	1.0	731.761897	26.892486	3.124916e-07
4	501.0	13967.690615	1.0	647.790646	23.806516	1.433576e-06
5	500.0	13597.035027	1.0	370.655588	13.621713	2.481525e-04
6	499.0	13554.671158	1.0	42.363869	1.556886	2.127070e-01
7	498.0	13550.901204	1.0	3.769954	0.138547	7.098881e-01

	Sales	CompPrice	Income	Advertising	Population	Price	ShelveLoc	Age	Education	Urban	US
0	9.50	138	73	11	276	120	Bad	42	17	Yes	Yes
1	11.22	111	48	16	260	83	Good	65	10	Yes	Yes
2	10.06	113	35	10	269	80	Medium	59	12	Yes	Yes
3	7.40	117	100	4	466	97	Medium	55	14	Yes	Yes
4	4.15	141	64	3	340	128	Bad	38	13	Yes	No

Dep. Variable:	Sales	R-squared:	0.876
Model:	OLS	Adj. R-squared:	0.872
Method:	Least Squares	F-statistic:	210.0
Date:	Sat, 06 Oct 2018	Prob (F-statistic):	6.14e-166
Time:	09:39:31	Log-Likelihood:	-564.67
No. Observations:	400	AIC:	1157.
Df Residuals:	386	BIC:	1213.
Df Model:	13
Covariance Type:	nonrobust

Omnibus:	1.281	Durbin-Watson:	2.047
Prob(Omnibus):	0.527	Jarque-Bera (JB):	1.147
Skew:	0.129	Prob(JB):	0.564
Kurtosis:	3.050	Cond. No.	1.31e+05

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	33.2228	0.731	45.458	0.000	31.787	34.659
lstat	-1.0321	0.048	-21.416	0.000	-1.127	-0.937
age	0.0345	0.012	2.826	0.005	0.011	0.059

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	36.4911	5.104	7.149	0.000	26.462	46.520
crim	-0.1072	0.033	-3.276	0.001	-0.171	-0.043
zn	0.0464	0.014	3.380	0.001	0.019	0.073
indus	0.0209	0.061	0.339	0.735	-0.100	0.142
chas	2.6886	0.862	3.120	0.002	0.996	4.381
nox	-17.7958	3.821	-4.658	0.000	-25.302	-10.289
rm	3.8048	0.418	9.102	0.000	2.983	4.626
age	0.0008	0.013	0.057	0.955	-0.025	0.027
dis	-1.4758	0.199	-7.398	0.000	-1.868	-1.084
rad	0.3057	0.066	4.608	0.000	0.175	0.436
tax	-0.0123	0.004	-3.278	0.001	-0.020	-0.005
ptratio	-0.9535	0.131	-7.287	0.000	-1.211	-0.696
black	0.0094	0.003	3.500	0.001	0.004	0.015
lstat	-0.5255	0.051	-10.366	0.000	-0.625	-0.426

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	36.0885	1.470	24.553	0.000	33.201	38.976
lstat	-1.3921	0.167	-8.313	0.000	-1.721	-1.063
age	-0.0007	0.020	-0.036	0.971	-0.040	0.038
lstat:age	0.0042	0.002	2.244	0.025	0.001	0.008

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	42.8620	0.872	49.149	0.000	41.149	44.575
lstat	-2.3328	0.124	-18.843	0.000	-2.576	-2.090
I(lstat ** 2)	0.0435	0.004	11.628	0.000	0.036	0.051