Linear Regression on bi-variate data

8 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.14
List Price:  €93.07
You save:  €27.92
£55.73
List Price:  £79.62
You save:  £23.88
CA$95.61
List Price:  CA$136.60
You save:  CA$40.98
A$106.30
List Price:  A$151.87
You save:  A$45.56
S$94.64
List Price:  S$135.20
You save:  S$40.56
HK$546.91
List Price:  HK$781.33
You save:  HK$234.42
CHF 63.50
List Price:  CHF 90.72
You save:  CHF 27.21
NOK kr764.69
List Price:  NOK kr1,092.46
You save:  NOK kr327.77
DKK kr485.92
List Price:  DKK kr694.20
You save:  DKK kr208.28
NZ$117
List Price:  NZ$167.15
You save:  NZ$50.15
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,661.98
List Price:  ৳10,946.16
You save:  ৳3,284.17
₹5,839.65
List Price:  ₹8,342.71
You save:  ₹2,503.06
RM331.75
List Price:  RM473.95
You save:  RM142.20
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,492.21
List Price:  ₨27,847.21
You save:  ₨8,355
฿2,575.56
List Price:  ฿3,679.53
You save:  ฿1,103.97
₺2,262.43
List Price:  ₺3,232.18
You save:  ₺969.75
B$357.76
List Price:  B$511.10
You save:  B$153.34
R1,296.01
List Price:  R1,851.52
You save:  R555.51
Лв127.38
List Price:  Лв181.98
You save:  Лв54.60
₩95,113.23
List Price:  ₩135,881.87
You save:  ₩40,768.63
₪260.11
List Price:  ₪371.60
You save:  ₪111.49
₱3,999.61
List Price:  ₱5,713.97
You save:  ₱1,714.36
¥10,715.43
List Price:  ¥15,308.41
You save:  ¥4,592.98
MX$1,185.45
List Price:  MX$1,693.57
You save:  MX$508.12
QR254.79
List Price:  QR364.01
You save:  QR109.21
P955.69
List Price:  P1,365.33
You save:  P409.64
KSh9,427.65
List Price:  KSh13,468.65
You save:  KSh4,041
E£3,355.67
List Price:  E£4,794.02
You save:  E£1,438.35
ብር3,989.43
List Price:  ብር5,699.43
You save:  ብር1,710
Kz58,616.62
List Price:  Kz83,741.62
You save:  Kz25,125
CLP$66,326.02
List Price:  CLP$94,755.52
You save:  CLP$28,429.50
CN¥506.51
List Price:  CN¥723.62
You save:  CN¥217.11
RD$4,049.59
List Price:  RD$5,785.38
You save:  RD$1,735.78
DA9,420.19
List Price:  DA13,457.99
You save:  DA4,037.80
FJ$157.70
List Price:  FJ$225.30
You save:  FJ$67.59
Q542.62
List Price:  Q775.21
You save:  Q232.58
GY$14,613.08
List Price:  GY$20,876.73
You save:  GY$6,263.64
ISK kr9,792.30
List Price:  ISK kr13,989.60
You save:  ISK kr4,197.30
DH706.05
List Price:  DH1,008.69
You save:  DH302.63
L1,239.86
List Price:  L1,771.31
You save:  L531.44
ден4,010.92
List Price:  ден5,730.13
You save:  ден1,719.21
MOP$562.15
List Price:  MOP$803.11
You save:  MOP$240.95
N$1,302.54
List Price:  N$1,860.85
You save:  N$558.31
C$2,571.43
List Price:  C$3,673.63
You save:  C$1,102.20
रु9,317.58
List Price:  रु13,311.40
You save:  रु3,993.82
S/262.81
List Price:  S/375.46
You save:  S/112.65
K268.53
List Price:  K383.63
You save:  K115.10
SAR262.51
List Price:  SAR375.03
You save:  SAR112.52
ZK1,879.71
List Price:  ZK2,685.42
You save:  ZK805.70
L324.19
List Price:  L463.14
You save:  L138.95
Kč1,629.65
List Price:  Kč2,328.17
You save:  Kč698.52
Ft25,373.17
List Price:  Ft36,248.95
You save:  Ft10,875.77
SEK kr758.75
List Price:  SEK kr1,083.98
You save:  SEK kr325.22
ARS$61,468.94
List Price:  ARS$87,816.53
You save:  ARS$26,347.59
Bs482.36
List Price:  Bs689.12
You save:  Bs206.75
COP$272,946.91
List Price:  COP$389,940.87
You save:  COP$116,993.96
₡35,623.88
List Price:  ₡50,893.45
You save:  ₡15,269.56
L1,732.95
List Price:  L2,475.75
You save:  L742.80
₲523,151.84
List Price:  ₲747,391.81
You save:  ₲224,239.96
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł281.85
List Price:  zł402.67
You save:  zł120.81
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. We will be starting a new module called regression analysis, and in today's video, we will learn about something called linear regression. So what is linear regression? So first we'll begin our study of simple linear regression consider the scatterplot of weight versus height of adults as shown in this figure, the creator or the form of the relationship is strongly positive, as you can see from the scatterplot. Now, suppose we wish to estimate the weight of a person just by knowing his or her height. In order to do so, we first feed a straight line through our data points like this, then the form the graph knowing the height we can find the weight of the corresponding person Hence, we are intending to find out the equation of the straight line that best describes the relationship between weight and height.

Now there is only one predictor or the input variable that is the height and one target variable that is the weight and we are in To find out a relationship of the form y equals to theta naught plus beta one x here y is the target variable and x is the predictor variable here, y will be nothing but the weight and x will be nothing but the height, we have to find out who doesn't want to one such that the straight line y equals to theta zero plus theta one x fits into our data set best This is called simple linear regression, because it has only one predictor that is x and the relationship among the target variable and the predictor variable is linear. It is nothing but a straight line right now, simple linear regression model with a single predictor will describe our model. So y equals to theta zero plus theta one x plus epsilon here, theta zero and theta one are called the model parameters, y is called the target variable.

Theta zero is called the intercept, theta one is called the slope. X is called the predictor variable and epsilon is called the random residuals. Later, we use our sample data to find estimates for the coefficients of the model parameter theta zero and theta one. We call the estimates theta zero as theta zero hat and the estimate of theta one as theta one hat, we can then predict what the value of y should be corresponding to a particular value of x by using the least squares prediction equation, also known as our hypothesis function, which is denoted by Y hat equals to theta zero hat first theta one hat X. Here, y hat is our predicted variable. So y is basically the true value and Y hat is our prediction, no residuals and residual sum of squares for the sample x comma y.

The predicted value of y is y hat, which we obtained from the equation y hat equals to three times zero hat plus two to one had expired. Now, there is some error associated with this estimate. The error for this is of sufficient will be y minus y hat that is actual minus predicted this represents the year residual, we define this it was sum of squares as addresses or the residual sum of squares is equal to sum over all Ei square items from one to m. So, we can expand the value of Ei as shown over here as Ei is nothing but y minus y i hat and replacing the value of y i had we get the residual sum of squares in this format note that there are total in number of training samples no mean square error cost function we can define the cost function as g of so g is the cost function, which is a function of the model parameter theta zero hat and theta one hat will be equal to half of the residual sum of squares divided by the number of printing samples is one by twice m multiplied with the number of cases In some sort of multiplied with the residual sum of squares is the total number of training samples here a factor half is multiplied just for computational simplicity, otherwise, the cost function g is nothing but mean or average of 30.

So it was off squares, which is also known as mean squared error or m s E. Now, our objective is to find suitable values of theta zero hat and theta one hat such that the cost function g is minimized. In other words, the residual sum of squares is minimized, then the straight line will fit our data set best This is called the least square feet. Now, we intuitively understand what the cost function mean. Consider the example of a single predictor variable where the hypothesis function is of the form y hat is equal to beta zero hat plus beta one hat into x and the cost function g is having this form. Now, we keep one parameter fixed so, there are two per model parameters theta It'll hurt and fit a one hat we will keep one parameter fixed and very another and let's see how the cost function J varies.

So let's say this is our data set. So first we fixed t does you know hack and we've had E to one hat t does you know hat is nothing but the intercept part and Tito one hat is nothing but the slow part of the straight line. So, this is one straight line that we can fit and corresponding cost function would be this one, this is another straight line and note that as the straight line waiter feeds than the previous state land the cost will be less similarly this is another straight line, this is another straight line, okay. So as we increase the value of theta one hat, the cost function g will first decrease and then increase again. So there is one sweet spot of theta one hat where the cost function is minimized. Okay, so this is the particular theta one hat now we'll see See 321 had is fixed but we've had in the beta zero hat.

Okay, so we'll fix the slope but will vary the interesting let's see this is for zero intercept we'll get some cost value for another intercept we'll get some other cost value, if we keep on changing the theta zero hat, we also see that there is a curve like that of cost function passes to zero what it means here also there is a sweet spot or there is a minima where the cost function is minimized for a certain value of theta zero hack okay. So our objective is to find out these particular frita one hat and theta zero hat for which this cost function is minimized. No solving for this feat, in order to minimize the residual sum of square or the cost function with respect to theta zero hat on T dot one hat, what we shall do we will take the partial derivative of the ratio square both respect to theta zero hat and theta one hat and set them individually to equal to zero.

Okay, so we'll take the partial derivative of this it was off square with respect to theta zero hat set it to close to zero and again we'll take partial derivative of this it was sum of square with respect to theta one hat and set it equal to zero by solving the M of two equation we get the following value of theta one hat and beta zero hat for simplicity, I am not showing the entire derivation, but I will recommend students to do it by their own okay. So theta one hat is nothing but sum over x i minus x bar multiplied with y minus y bar where it goes from one to M divided by sum over excitement as expert whole square items from one to n. Okay, if we expand the value and we rearrange, we can see that it is nothing but correlation coefficient between x and y multiplied We standard deviation of y divided by standard deviation of x and theta zero hat will be nothing but a mean of Y minus theta one hat that we have already obtained in this equation multiplied with x mean okay or x bar.

So x bar and y bar are nothing but the mean values of x and y respectively. Okay, so first we opt in to tawan hat and putting the value of theta one hat over here, we'll get the estimated value of theta zero hat oh where expert is the mean of the filter variable as already stated and why what is the mean of the target failure? Okay, sigma x is the standard deviation of x and sigma y is the standard deviation of y and our x y as I have already stated is the correlation coefficient between x and y. In the next video, we shall see how to implement the Y variate linear regression in Python. See you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.