Hello everyone, welcome to the course on machine learning with Python. In this particular video, we will see how to explore Python in order to solve linear regression problem with single predictor variable. That means this is a bi variate regression analysis in Python. So first we will be importing the data set. Now these data set you are already familiar with the ADA Python analyses in the last module. so here also we'll be using the hybrid gender constant data.
So first, we'll be reading these data to our data frame called constant data. And we'll be printing the first few rows. As you can see, I have printed first five rows of the data frame. Now we want to predict the weight based on the person's height. So here the predictor variable x is equals to height and the target variable y is equals to weight. So I'll be taking y and x into separate NumPy arrays.
Okay, as you can see why he's nothing but a NumPy array and similarly if I type x x is also a NumPy array Now we shall find me and the standard deviation of x and y so for that we'll be importing NumPy okay. So mean value of y is y underscore me which is nothing but MP dot F or H of Y or MP dot mean of while the STD underscore dv underscore y stores the standard deviation of y and I have obtained the value using MP dot STD within bracket one similarly, we can obtain the value of mean or the average value of x and the standard deviation of x. Okay, now I am going to run this particular cell and then I'll be printing all the values I have obtained. x mean here is 130 8.26. Y mean is 35.61. standard deviation of x is 27.58 and standard deviation of y is 14.7.
Now, we shall find the correlation coefficient between x and y. We have already seen how to opt in these in the exploratory data analysis class in the same way will obtain coalition coefficient between X and Y using mp.co double RC function Okay, and the correlation coefficient rounded up to three decimal point is point 941. Now, we shall find the estimated model parameters using the equations we have seen in the last video. So, our estimate is theta one is equals to correlation coefficient between x and y multiplied with the standard deviation of y divided by the standard deviation of eggs. So, we'll go ahead and run this particular sin and print the estimated value of theta one and that is 0.5017. Similarly, we can go ahead and obtain the estimated value of the parameter theta zero which is nothing but y mean minus estimate Produce Tito one multiplied with x men.
So if we run this particular sin, we can see that theta zero is nothing but minus 33.756. Now we can go ahead and plot the regression line along with the data. So we'll import matplotlib.pi plot for this particular plotting. Now, to plot the regression line, we need some x data, which is nothing but linearly spaced value between the minimum value of x and the maximum value of x and we are using hundred data points and y data is nothing but theta zero multiplied. So theta zero plus theta one multiplied with x theta. Let's go ahead and compute x data and y data.
Now in this particular cell, we'll be plotting the scatter plot of X and Y that means weight versus height. along we will be plotting the x data and y data and we'll call it eight by eight to show the regression line have done that bleed on. So, you can see a bleed inside the plot. So, as you can see, these blue dots are nothing but the scatterplot of weight versus height and the rate line will show us the regression line okay. So, this is the best fit a straight line of the entire data set. So, so far this one So, we have seen how to explore the power of Python in order to solve the BI variate regression analysis problem.
In the next video, we shall go into the deeper detail of regression analysis, which is multiple linear regression. Thank you see you in the next lecture.