import pandas as pd

Using Advertising Dataset, link :-

https://www.kaggle.com/bumba5341/advertisingcsv

import statsmodels.api as sm

df_adv = pd.read_csv('Advertising.csv', index_col=0)
df_adv.head()

X = df_adv[['TV','radio','newspaper']]
y = df_adv['sales']

print(X,y)

        TV  radio  newspaper
1    230.1   37.8       69.2
2     44.5   39.3       45.1
3     17.2   45.9       69.3
4    151.5   41.3       58.5
5    180.8   10.8       58.4
..     ...    ...        ...
196   38.2    3.7       13.8
197   94.2    4.9        8.1
198  177.0    9.3        6.4
199  283.6   42.0       66.2
200  232.1    8.6        8.7

[200 rows x 3 columns] 1      22.1
2      10.4
3       9.3
4      18.5
5      12.9
       ... 
196     7.6
197     9.7
198    12.8
199    25.5
200    13.4
Name: sales, Length: 200, dtype: float64

fit a Oridinirary least square model with intercept on TV and Radio

X = sm.add_constant(X)

X

model = sm.OLS(y, X).fit()

model.summary() # const indicates B0 value

import matplotlib.pyplot as plt
import seaborn as sns

X.iloc[:,1:].corr()

plt.imshow(X,cmap='autumn')
plt.show()

sns.heatmap(X,linewidth = 0.5 , cmap = 'coolwarm')
plt.show()

Using Salary DataSet, link :-

https://github.com/mr-siddy/Machine-Learning/blob/master/Linear%20Regression/Salary_Data.csv

df_salary = pd.read_csv('Salary_Data.csv')
df_salary.head()

X = df_salary[['YearsExperience','Age']]
y = df_salary['Salary']

fit OLS model on y and X

X = sm.add_constant(X)
model = sm.OLS(y,X).fit()

model.summary() # here observe R2, const, stderr and P>|t| --> high correlation

X.iloc[:,1:].corr()

sns.heatmap(X, cmap='summer')

<matplotlib.axes._subplots.AxesSubplot at 0x21443500808>

How to Resolve

we check the P value and drop the feature which has higher p value

drop_age = X.drop('Age', axis=1)

model = sm.OLS(y,drop_age).fit()
model.summary()

Dep. Variable:	sales	R-squared:	0.897
Model:	OLS	Adj. R-squared:	0.896
Method:	Least Squares	F-statistic:	570.3
Date:	Tue, 27 Apr 2021	Prob (F-statistic):	1.58e-96
Time:	19:40:31	Log-Likelihood:	-386.18
No. Observations:	200	AIC:	780.4
Df Residuals:	196	BIC:	793.6
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2.9389	0.312	9.422	0.000	2.324	3.554
TV	0.0458	0.001	32.809	0.000	0.043	0.049
radio	0.1885	0.009	21.893	0.000	0.172	0.206
newspaper	-0.0010	0.006	-0.177	0.860	-0.013	0.011

Omnibus:	60.414	Durbin-Watson:	2.084
Prob(Omnibus):	0.000	Jarque-Bera (JB):	151.241
Skew:	-1.327	Prob(JB):	1.44e-33
Kurtosis:	6.332	Cond. No.	454.

	TV	radio	newspaper
TV	1.000000	0.054809	0.056648
radio	0.054809	1.000000	0.354104
newspaper	0.056648	0.354104	1.000000

	YearsExperience	Age	Salary
0	1.1	21.0	39343
1	1.3	21.5	46205
2	1.5	21.7	37731
3	2.0	22.0	43525
4	2.2	22.2	39891

	TV	radio	newspaper	sales
1	230.1	37.8	69.2	22.1
2	44.5	39.3	45.1	10.4
3	17.2	45.9	69.3	9.3
4	151.5	41.3	58.5	18.5
5	180.8	10.8	58.4	12.9

Omnibus:	2.695	Durbin-Watson:	1.711
Prob(Omnibus):	0.260	Jarque-Bera (JB):	1.975
Skew:	0.456	Prob(JB):	0.372
Kurtosis:	2.135	Cond. No.	626.

Omnibus:	2.140	Durbin-Watson:	1.648
Prob(Omnibus):	0.343	Jarque-Bera (JB):	1.569
Skew:	0.363	Prob(JB):	0.456
Kurtosis:	2.147	Cond. No.	13.2

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-6661.9872	2.28e+04	-0.292	0.773	-5.35e+04	4.02e+04
YearsExperience	6153.3533	2337.092	2.633	0.014	1358.037	1.09e+04
Age	1836.0136	1285.034	1.429	0.165	-800.659	4472.686

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	2.579e+04	2273.053	11.347	0.000	2.11e+04	3.04e+04
YearsExperience	9449.9623	378.755	24.950	0.000	8674.119	1.02e+04