# Q1: What is MultiCollinearity?

`If any of our Independent Feature(x1,x2) is internally co-related more than 90%.`

## Q2: How multicollinearity works and how to resolve it?

lets say we are solving Regression/classification problem where we have 10–15 features. i.e (n rows, 15 features) n x 15.

Step 1: we plot correlation heat map by comparing each feature with each other.

Step 2: So,lets say after doing step 1 we got features[f3,f4] which are highly correlated with more than 90%.

Step 3: So, what we can do is remove any one of the feature which has [p-value > 0.05]

Important Note: It is not Possible for finding correlation for each feature if we have large amount of features such as , eg: (n rows,200 features) n x 200. So to solve that issue we use something called Ridge and Lasso Regression.

Lets take example:

1:

X = df[[‘TV’, ‘radio’, ’newspaper’]] #Independent Features
y = df[‘sales’] #Dependent Features

# y = b0 + b1x1 + b2x2 + b3x3

`x1 = TV , x2 = radio , x3 = newspaper , y = sales        b0 = Intercept        b1,b2,b3 = Slopes or Coefficient`

## So in order to check if there is Multi Collinearity issue or not we will use OLS MODEL : Ordinary Least square.

import statsmodels.api as sm
X = sm.add_constant(X) # add B0 with all const values

model= sm.OLS(y, X).fit()
model.summary()

import matplotlib.pyplot as plt
X.iloc[:,1:].corr()

# 2:

X = df1[[“YearsExperience”,”Age”]]
y = df1[‘Salary’]

## Using OLS model

import statsmodels.api as sm
X = sm.add_constant(X) # add B0 with all const values