Advanced Topics: Normal Equation, Polynomial Regression and R-sq score

9 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€59.73
List Price:  €85.34
You save:  €25.60
£51.81
List Price:  £74.02
You save:  £22.20
CA$96.84
List Price:  CA$138.36
You save:  CA$41.51
A$106.75
List Price:  A$152.51
You save:  A$45.75
S$89.95
List Price:  S$128.50
You save:  S$38.55
HK$545.64
List Price:  HK$779.53
You save:  HK$233.88
CHF 55.86
List Price:  CHF 79.80
You save:  CHF 23.94
NOK kr703.17
List Price:  NOK kr1,004.57
You save:  NOK kr301.40
DKK kr445.91
List Price:  DKK kr637.04
You save:  DKK kr191.13
NZ$118.76
List Price:  NZ$169.67
You save:  NZ$50.90
د.إ257.03
List Price:  د.إ367.21
You save:  د.إ110.17
৳8,495.52
List Price:  ৳12,136.98
You save:  ৳3,641.45
₹6,172.17
List Price:  ₹8,817.76
You save:  ₹2,645.59
RM295.67
List Price:  RM422.40
You save:  RM126.73
₦106,514.70
List Price:  ₦152,170.38
You save:  ₦45,655.68
₨19,808.40
List Price:  ₨28,298.93
You save:  ₨8,490.52
฿2,243.26
List Price:  ฿3,204.79
You save:  ฿961.53
₺2,883.76
List Price:  ₺4,119.83
You save:  ₺1,236.07
B$378.92
List Price:  B$541.35
You save:  B$162.42
R1,231.03
List Price:  R1,758.69
You save:  R527.66
Лв116.90
List Price:  Лв167.02
You save:  Лв50.11
₩97,041.13
List Price:  ₩138,636.13
You save:  ₩41,595
₪233.01
List Price:  ₪332.89
You save:  ₪99.87
₱3,968.43
List Price:  ₱5,669.43
You save:  ₱1,701
¥10,316.66
List Price:  ¥14,738.72
You save:  ¥4,422.05
MX$1,309.87
List Price:  MX$1,871.33
You save:  MX$561.45
QR255.14
List Price:  QR364.50
You save:  QR109.36
P938.51
List Price:  P1,340.79
You save:  P402.27
KSh9,032.87
List Price:  KSh12,904.65
You save:  KSh3,871.78
E£3,397.07
List Price:  E£4,853.17
You save:  E£1,456.09
ብር9,985.48
List Price:  ብር14,265.58
You save:  ብር4,280.10
Kz64,180.83
List Price:  Kz91,690.83
You save:  Kz27,510
CLP$67,863.28
List Price:  CLP$96,951.70
You save:  CLP$29,088.41
CN¥499.22
List Price:  CN¥713.21
You save:  CN¥213.98
RD$4,414.63
List Price:  RD$6,306.89
You save:  RD$1,892.25
DA9,089.11
List Price:  DA12,985
You save:  DA3,895.89
FJ$157.67
List Price:  FJ$225.25
You save:  FJ$67.58
Q535.38
List Price:  Q764.86
You save:  Q229.48
GY$14,604.61
List Price:  GY$20,864.62
You save:  GY$6,260.01
ISK kr8,549.85
List Price:  ISK kr12,214.60
You save:  ISK kr3,664.74
DH634.40
List Price:  DH906.33
You save:  DH271.92
L1,171.66
List Price:  L1,673.87
You save:  L502.21
ден3,674.28
List Price:  ден5,249.20
You save:  ден1,574.92
MOP$561.02
List Price:  MOP$801.49
You save:  MOP$240.47
N$1,234.02
List Price:  N$1,762.97
You save:  N$528.94
C$2,569.13
List Price:  C$3,670.35
You save:  C$1,101.21
रु9,858.12
List Price:  रु14,083.64
You save:  रु4,225.51
S/245.88
List Price:  S/351.28
You save:  S/105.39
K291.36
List Price:  K416.25
You save:  K124.88
SAR262.47
List Price:  SAR374.98
You save:  SAR112.50
ZK1,666.62
List Price:  ZK2,380.99
You save:  ZK714.36
L303.21
List Price:  L433.17
You save:  L129.96
Kč1,456.45
List Price:  Kč2,080.74
You save:  Kč624.28
Ft23,442.21
List Price:  Ft33,490.30
You save:  Ft10,048.09
SEK kr657.57
List Price:  SEK kr939.43
You save:  SEK kr281.85
ARS$95,451.09
List Price:  ARS$136,364.55
You save:  ARS$40,913.45
Bs482.36
List Price:  Bs689.11
You save:  Bs206.75
COP$278,383.76
List Price:  COP$397,708.14
You save:  COP$119,324.37
₡35,369.65
List Price:  ₡50,530.25
You save:  ₡15,160.59
L1,828.90
List Price:  L2,612.83
You save:  L783.92
₲503,139.27
List Price:  ₲718,801.20
You save:  ₲215,661.92
$U2,810.15
List Price:  $U4,014.67
You save:  $U1,204.52
zł253.83
List Price:  zł362.63
You save:  zł108.80
Already have an account? Log In

Transcript

Hello everyone welcome to the course of machine learning with Python. In this video, we shall learn about some advanced revision of our first topic is normal equation. Now, the training data set can be written as follows, there are total key many features and in many training samples notice that we have added one extra feature column extra row with all values one to the left, the training samples can now be written as x superscript r comma y superscript is from one to n x superscript i is nothing but victor of the values extra x one x two up to escape, notice that these external superscript will be always one for all items from one to n. Now the equation y equals two theta zero plus theta one x one plus beta two x two up to theta k x k can be written in the vector form as y equals to theta transpose x, where capital theta is nothing but the victor of the model parameters suitable theta is theta zero theta one theta two theta K, notice that we have used the transpose in order to denote that this is a column vector.

Now, in linear regression model, we are trying to estimate the model parameter vector from the given set of data, let the estimated parameter vector v theta hat and the corresponding predicted values the Y hat then in vector matrix notation y hat will be equals to the victim of the predicted values y hat one y hat two up to y hat aim, which is nothing but this matrix of x multiplied with the theta. So, y hat Victor is equals to x multiplied with that theta hat. So, what is this x matrix this is the x matrix we have already seen this now, the mean square error cost function in matrix notation can be written as forming. So, what is this x theta hat It is nothing but y hat I have already pointed it out. So, it is Y hat minus y whole transpose multiplied with y hat minus one.

So, it is a scalar okay. So, ultimately what we are actually trying to do is Y hat minus y whole square. So I am taking the norm of the vector y hat minus y and then squaring it up. So that will give me the mean squared error cost function. Note that y hat equals to x theta hat is the victor of the predicted values. And why is the victor of the actual values now, we are trying to simplify the cost function.

So, our first tip would be within these parentheses we'll move the transpose operation. So, now, it will become within these parentheses x theta hat transpose minus y transpose. Now, we shall multiply these first parentheses as a second penalty that means, the first term and the second term So, this is the expanded form Okay. Now, note that Eb pole transpose is nothing but V transpose multiplied with a transpose. So, x theta hat full transpose will be equals to theta transpose x transpose again note that y can Suppose multiplied with x theta hat and x theta hat whole transpose multiplied with y vector are scalars. And if you produce the transpose of these particular terms is nothing but this term and we know that the transpose of a particular scalar will be the scalar itself.

So, we can say that y transpose multiplied with x theta hat is equal to x theta hat whole transpose multiplied with y again using the same principle, we can expand it as theta hat transpose X transpose Y. So, that is why I am getting an Lu over here. Now, if we differentiate this cost function with respect to theta hat and save the derivative to zero what we shall get so, we shall get x transpose x theta hat is equal to x transpose one or in other words theta hat is equal to X transpose X inverse multiplied with X transpose Y as you bring that extra Suppose x is invertible. So, this is our normal equation from where we can actually find the model parameters directly if we know the feature matrix X and the corresponding actual values y. Now, between gradient descent a normal equation, which one is preferable, though the normal equation directly gives the solution without iteration like gradient descent, it has many drawbacks for example, for large data set computing X transpose X inverse is a costly operation.

Moreover, X transpose X is non invertible we cannot use normal equation directly as the water in the case when X transpose X is non invertible is to use pseudo inverse pseudo inverse is a numerical method based on some matrix operations and linear algebra, which will compute the approximate inverse of all non invertible matrix gradient descent is more popular and good choice for solving linear regression problem though gradient descent has its own demerits, like it is susceptible to local minima, and we have to choose certain parameters like learning rate, and it is not trivial to choose learning rate when we have different kind of cost versus iteration costs. So, but overall gradient descent is more popular, and we'll be using that for solving the linear regression problem. Okay, so our next topic is polynomial regression. Consider the following example, where I have shown the scatter plot of a biomedical data, we can fit a straight line through the data points of the form y equals to theta zero plus theta one x, but we can do better, right if we feed a second order polynomial of the form y equals to theta zero plus theta one expected a two x squared or if we feed a third polynomial of the form y equals to theta zero plus theta one expositor two x squared plus two to three x three.

In general, we can feed in it order polynomial To the data points, so the model for the electoral polynomial is y equals two theta zero plus theta one x plus theta two x squared plus beta three x cubed plus theta n x to the power n. Now smaller the value of n, the complexity of the model is less, but the model may not fit the data set appropriately. So we have to choose in accordingly, such that we get reasonably good feet within this complexity, we can convert the polynomial intuition problem into multiple linear decision problem just by assigning x one equals two small x x two equals two x square x equals two x cube and x in equal straights to the bar in and then construct the multiple linear regression model as y equals to theta zero plus sum over theta i x i have items from one to n. Note that x i is nothing but x to the power.

For more than one predictor variables, the polynomial regression becomes more complicated for two predictor variables x one and x two the generalized form of signal polynomial becomes y equals two theta zero plus theta one x one plus beta two x two plus three x two x one x two plus theta four x one square plus feet of high x two square. So for two variables two predictor variables x one and x two, for a second order polynomial, it become six factor model, which is quite complex. Let's move ahead and discuss about something called coefficient of determination to determine the goodness of the feet in a linear regression model, we use a quantitative measure and that is called coefficient of determination or r square score, it is defined as follows. So let their smallest number of data points and y equals two y one y two y three up to y n is the vector of actual values of the target variable and Y hat equals two y one hat y two y three hat up to y m is a vector of predicted values of the target variable.

Note that I have used bold faces of y and y hat in order to evoke that These elevators and these without bold faces these denote the scales Okay. Now let y bar is the mean of the target variable then the total sum of squares is defined as follows the total sum of squares is equals to y minus y bar whole square and we sum it from items from one to M, the total sum of squares is proportional to the variance of the target variable. Now, we already know what is called as it was our square. So how do we compute residual sum of squares. So, residual sum of squares is equals to actual minus predicted whole square and we take this summation for all data samples. So in this case, it is y minus y i had whole square items from one to m. Another faction of unexplained variance is defined as if u equals to RSS by TSS so residual sum of squares divided by two to solve square the coefficient of determination or r square also called the fraction of explained variance.

Is defined as r squared equals to one minus fraction of unexpressed videos that is if you V which is equals to R SS by cases so r square is equal to one minus R is by TSS coefficient of determination r square lies between zero to one closer to the values of r squared to one regression model fits better to our data set and can better explain the observed variability of the target variable. smaller value of r squared implies that that equation model is not that good. It can be shown that for my video data set r square is equals to square of the correlation coefficient between the predictor and the target variable. So, so far this one in the next video, we shall learn about how to code these normal equation in Python. So see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.