Concept of Co-variance and Correlation

Machine Learning Using Python Statistics and Exploratory Data Analysis
6 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.39
List Price:  €93.41
You save:  €28.02
£55.92
List Price:  £79.90
You save:  £23.97
CA$96.01
List Price:  CA$137.16
You save:  CA$41.15
A$107.15
List Price:  A$153.08
You save:  A$45.93
S$95.13
List Price:  S$135.90
You save:  S$40.77
HK$547.14
List Price:  HK$781.66
You save:  HK$234.52
CHF 63.86
List Price:  CHF 91.23
You save:  CHF 27.37
NOK kr775.40
List Price:  NOK kr1,107.76
You save:  NOK kr332.36
DKK kr487.78
List Price:  DKK kr696.86
You save:  DKK kr209.07
NZ$118.01
List Price:  NZ$168.60
You save:  NZ$50.58
د.إ257.06
List Price:  د.إ367.24
You save:  د.إ110.18
৳7,680.49
List Price:  ৳10,972.60
You save:  ৳3,292.11
₹5,842.03
List Price:  ₹8,346.11
You save:  ₹2,504.08
RM332.86
List Price:  RM475.54
You save:  RM142.67
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,491.96
List Price:  ₨27,846.85
You save:  ₨8,354.89
฿2,586.09
List Price:  ฿3,694.58
You save:  ฿1,108.48
₺2,265.39
List Price:  ₺3,236.41
You save:  ₺971.02
B$363.53
List Price:  B$519.35
You save:  B$155.82
R1,302.64
List Price:  R1,861
You save:  R558.35
Лв127.90
List Price:  Лв182.73
You save:  Лв54.82
₩96,270.48
List Price:  ₩137,535.16
You save:  ₩41,264.67
₪262.29
List Price:  ₪374.71
You save:  ₪112.42
₱4,033.94
List Price:  ₱5,763.02
You save:  ₱1,729.07
¥10,867.12
List Price:  ¥15,525.12
You save:  ¥4,658
MX$1,187.12
List Price:  MX$1,695.96
You save:  MX$508.84
QR254.93
List Price:  QR364.20
You save:  QR109.27
P994.08
List Price:  P1,420.18
You save:  P426.09
KSh9,360.69
List Price:  KSh13,372.99
You save:  KSh4,012.30
E£3,358.63
List Price:  E£4,798.26
You save:  E£1,439.62
ብር4,003.77
List Price:  ብር5,719.92
You save:  ብር1,716.15
Kz58,546.63
List Price:  Kz83,641.63
You save:  Kz25,095
CLP$67,216.99
List Price:  CLP$96,028.39
You save:  CLP$28,811.40
CN¥506.70
List Price:  CN¥723.89
You save:  CN¥217.19
RD$4,073.53
List Price:  RD$5,819.58
You save:  RD$1,746.04
DA9,418.34
List Price:  DA13,455.35
You save:  DA4,037.01
FJ$158.31
List Price:  FJ$226.17
You save:  FJ$67.86
Q543.96
List Price:  Q777.12
You save:  Q233.16
GY$14,650.29
List Price:  GY$20,929.88
You save:  GY$6,279.59
ISK kr9,815.39
List Price:  ISK kr14,022.59
You save:  ISK kr4,207.20
DH707.71
List Price:  DH1,011.06
You save:  DH303.35
L1,237.78
List Price:  L1,768.33
You save:  L530.55
ден4,025.24
List Price:  ден5,750.59
You save:  ден1,725.35
MOP$563.96
List Price:  MOP$805.69
You save:  MOP$241.73
N$1,304.33
List Price:  N$1,863.42
You save:  N$559.08
C$2,570.38
List Price:  C$3,672.13
You save:  C$1,101.75
रु9,397.27
List Price:  रु13,425.24
You save:  रु4,027.97
S/263.43
List Price:  S/376.35
You save:  S/112.91
K270.11
List Price:  K385.89
You save:  K115.77
SAR262.49
List Price:  SAR375.01
You save:  SAR112.51
ZK1,873.89
List Price:  ZK2,677.10
You save:  ZK803.21
L325.37
List Price:  L464.84
You save:  L139.46
Kč1,643.47
List Price:  Kč2,347.91
You save:  Kč704.44
Ft25,458.03
List Price:  Ft36,370.18
You save:  Ft10,912.14
SEK kr764.90
List Price:  SEK kr1,092.76
You save:  SEK kr327.86
ARS$61,327.27
List Price:  ARS$87,614.14
You save:  ARS$26,286.87
Bs483.57
List Price:  Bs690.85
You save:  Bs207.27
COP$273,218.78
List Price:  COP$390,329.27
You save:  COP$117,110.49
₡35,710.66
List Price:  ₡51,017.42
You save:  ₡15,306.75
L1,733.65
List Price:  L2,476.75
You save:  L743.09
₲524,442.73
List Price:  ₲749,236.02
You save:  ₲224,793.28
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł283.24
List Price:  zł404.64
You save:  zł121.40
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. In the last video we have seen how to interpret scatterplot that means the direction form and the strength of the scatterplot. In this video, we will learn some quantitative measure to describe the relationship between the quantitative variables. Okay, so let's go ahead the covariance the covariance between two quantitative variable measures how variation in one variable affects the variation in another how to calculate the covariance between two quantitative variable let's say x and y are two positive variables and this is the table of the values of x&y it assumes the value x one x two x three x and y y as used to value y one y two y three up to value then the covariance of x and y is equals to one upon in sum over items from one to eight x i minus x bar multiplied with y minus y over what is expert is what is the mean or the average value of x and y bar is that mean or the average value of y.

Now, what is the intuition to covariance. So, for the positive relationship, this is the scatterplot we can think of this middle point, this is the start point is x bar comma y bar. Now, if we shift our coordinate to the start point in the first quadrant describes Excel minus expert positive and by n minus y but is also positive, so, their product must be positive. Similarly, in the third quadrant, both x i minus x bar and Y minus Y bar and negative So, their product is also positive in the second and the fourth but it however, either excited minus expert is negative or y minus y but is negative hence the product is negative. Now, if we sum all these excite minus x bar and Y minus Y bar for all eyes, then we can see that the number of positive points is more than the number of negative points Hence, the total covariance will be positive.

So, covariance is greater than zero implies that there is a positive relationship. Similarly, covariance less than zero will imply a negative relationship, you can see that the number of points in the second in the fourth quadrant are more than that of the number of points in the first in the third quadrant. Now, consider this kind of scatter plot where all the quadrants contents almost equal number of points, hence, the covariances are most zero that means, there is no relationship at all okay. Now, the correlation coefficient the correlation coefficient is the numerical measure that assess the strength of the linear relationship between the two variables. Now, the definition is the correlation coefficient is a numerical measure that measures the strain and the direction of a linear relationship between two quantitative variables how the correlation coefficient is calculated. Again let us consider this table of the values of x and y x assumes the value x one up to x one and y So, the value y one y two y plitt sigma x and sigma y will be the standard deviation of The values of x and y respectively there are n number of data points, then the correlation coefficient between X and Y denoted by R suffix x y is one upon in summation items from one to n x i minus x bar divided by sigma x y minus y bar divided by sigma ya know expanding sigma x and sigma y and noting that x bar and y bar are nothing but the mean values of x and y respectively, then the correlation coefficient between x and y can be written as r of x one is nothing but covariance of X and Y divided by the square root of the variance of x and the square root of variance of y we can think of these as a normalized covariance between X and Y okay.

So, interpreting the values of correlation coefficient the correlation coefficient always lies between minus one to plus one the positive values of correlation coefficient indicates positive relationship negative realms of correlation coefficient indicates negative relationship if the quality coefficient value is close to plus one that indicates strong positive linear relationship. If the correlation coefficients value is close to minus one that indicates a strong negative linear relationship. Now the R values close to zero indicates the relationship is neither positive nor negative, okay. So, we can see a few examples, correlation coefficients value point 995 that implies a strong positive linear relationship the value minus point 575 it is a kind of weak linear negative relationship or moderately weak similarly our 0.436 it is weak positive linear relationship. Our value point one implies almost no relationship at all, and often equals two minus point 897. Strong linear negative relationship.

Okay, now let's go ahead and discuss a topic called causation. The scatterplot below illustrates how the number of firefighters saved to fight the fires is related to the amount of death which cost in the fires in a certain city. Now the scatterplot clearly displays a fairly strong relationship between the two variables one wedding is the damage and another very variable is the firefighters number of firefighters. Now can we say that the sending more number of firefighters will cause more damage Of course not. So what is going on here so there is a third variable in the background that is the seriousness of the fire that is responsible for the observed relationship between the damage and the number of firefighters more serious the fire more will be the damage and more number of firefighters will be required okay. So this third variable which is basically influencing the relationship or the opposite relationship between the two variables is called the lurking variable here the lurking variable is the seriousness of the fire, okay.

Hence, correlation does not always imply causation. So in the next video, we will see how to do exploratory data analysis in Python. See you in the next lecture. Thanks You

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.