Principal component analysis

Machine Learning Using Python Dimensionality Reduction
12 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.39
List Price:  €93.41
You save:  €28.02
£55.92
List Price:  £79.90
You save:  £23.97
CA$96.01
List Price:  CA$137.16
You save:  CA$41.15
A$107.15
List Price:  A$153.08
You save:  A$45.93
S$95.13
List Price:  S$135.90
You save:  S$40.77
HK$547.14
List Price:  HK$781.66
You save:  HK$234.52
CHF 63.86
List Price:  CHF 91.23
You save:  CHF 27.37
NOK kr775.40
List Price:  NOK kr1,107.76
You save:  NOK kr332.36
DKK kr487.78
List Price:  DKK kr696.86
You save:  DKK kr209.07
NZ$118.01
List Price:  NZ$168.60
You save:  NZ$50.58
د.إ257.06
List Price:  د.إ367.24
You save:  د.إ110.18
৳7,680.49
List Price:  ৳10,972.60
You save:  ৳3,292.11
₹5,842.03
List Price:  ₹8,346.11
You save:  ₹2,504.08
RM332.86
List Price:  RM475.54
You save:  RM142.67
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,491.96
List Price:  ₨27,846.85
You save:  ₨8,354.89
฿2,586.09
List Price:  ฿3,694.58
You save:  ฿1,108.48
₺2,265.39
List Price:  ₺3,236.41
You save:  ₺971.02
B$363.53
List Price:  B$519.35
You save:  B$155.82
R1,302.64
List Price:  R1,861
You save:  R558.35
Лв127.90
List Price:  Лв182.73
You save:  Лв54.82
₩96,270.48
List Price:  ₩137,535.16
You save:  ₩41,264.67
₪262.29
List Price:  ₪374.71
You save:  ₪112.42
₱4,033.94
List Price:  ₱5,763.02
You save:  ₱1,729.07
¥10,867.12
List Price:  ¥15,525.12
You save:  ¥4,658
MX$1,187.12
List Price:  MX$1,695.96
You save:  MX$508.84
QR254.93
List Price:  QR364.20
You save:  QR109.27
P994.08
List Price:  P1,420.18
You save:  P426.09
KSh9,360.69
List Price:  KSh13,372.99
You save:  KSh4,012.30
E£3,358.63
List Price:  E£4,798.26
You save:  E£1,439.62
ብር4,003.77
List Price:  ብር5,719.92
You save:  ብር1,716.15
Kz58,546.63
List Price:  Kz83,641.63
You save:  Kz25,095
CLP$67,216.99
List Price:  CLP$96,028.39
You save:  CLP$28,811.40
CN¥506.70
List Price:  CN¥723.89
You save:  CN¥217.19
RD$4,073.53
List Price:  RD$5,819.58
You save:  RD$1,746.04
DA9,418.34
List Price:  DA13,455.35
You save:  DA4,037.01
FJ$158.31
List Price:  FJ$226.17
You save:  FJ$67.86
Q543.96
List Price:  Q777.12
You save:  Q233.16
GY$14,650.29
List Price:  GY$20,929.88
You save:  GY$6,279.59
ISK kr9,815.39
List Price:  ISK kr14,022.59
You save:  ISK kr4,207.20
DH707.71
List Price:  DH1,011.06
You save:  DH303.35
L1,237.78
List Price:  L1,768.33
You save:  L530.55
ден4,025.24
List Price:  ден5,750.59
You save:  ден1,725.35
MOP$563.96
List Price:  MOP$805.69
You save:  MOP$241.73
N$1,304.33
List Price:  N$1,863.42
You save:  N$559.08
C$2,570.38
List Price:  C$3,672.13
You save:  C$1,101.75
रु9,397.27
List Price:  रु13,425.24
You save:  रु4,027.97
S/263.43
List Price:  S/376.35
You save:  S/112.91
K270.11
List Price:  K385.89
You save:  K115.77
SAR262.49
List Price:  SAR375.01
You save:  SAR112.51
ZK1,873.89
List Price:  ZK2,677.10
You save:  ZK803.21
L325.37
List Price:  L464.84
You save:  L139.46
Kč1,643.47
List Price:  Kč2,347.91
You save:  Kč704.44
Ft25,458.03
List Price:  Ft36,370.18
You save:  Ft10,912.14
SEK kr764.90
List Price:  SEK kr1,092.76
You save:  SEK kr327.86
ARS$61,327.27
List Price:  ARS$87,614.14
You save:  ARS$26,286.87
Bs483.57
List Price:  Bs690.85
You save:  Bs207.27
COP$273,218.78
List Price:  COP$390,329.27
You save:  COP$117,110.49
₡35,710.66
List Price:  ₡51,017.42
You save:  ₡15,306.75
L1,733.65
List Price:  L2,476.75
You save:  L743.09
₲524,442.73
List Price:  ₲749,236.02
You save:  ₲224,793.28
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł283.24
List Price:  zł404.64
You save:  zł121.40
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. In this video, we shall learn about principal component analysis, also known as PCA. Let's look at first variance and covariance, variance and covariance are the measures of spread of a set of points around their center of mass or me variance measures the spread of feature around its center of mass for a feature x, which takes the values x one x two X, two x in the variance is calculated as variance of x is equals to one upon in sum over x i minus mu squared, that is from one to n, where mu is nothing but the mean of all these values excite that means movies nothing but one upon pain, sum over excite ideas from one to N, no covariance between two features how Watch each of the feature vary from the mean with respect to each other.

For example, consider the following table which records the values of two features x one and x two, okay x one can take the value x one one x one two x one three up to x one n and x two can take the values x two one x two two x two three and up to x two n. The covariance between x one and x two is nothing but one upon n, some of her items from one to n x one IE minus nu one multiplied with x two IE minus two where mew one or new two is equals to one upon n sum over items from one to N exchange I where g is either one or two. Okay, G equals to one we'll get new one j goes to visit. Often observation overviews of one feature with itself is nothing but It's Friday and that means covariance between X comma x is nothing but variance of x.

Now covariance matrix if we have three dimensional data set such as x one x two x three, then we can measure the covariance between x one and x two, x two and x three and x one and x three measuring the covariance between x one and x one x one and x two and x one x three and x three would give us the variances of x one x two and x three respectively. We can consider the following metrics of dimension three cross three which records the values of these variances and covariances This is known as covariance matrix. So, as you can see, the diagonal entries of this covariance matrix are nothing but the variances. So, covariance x one comma x one is nothing but variance of x one covariance of x two comma x two is nothing but variance of x two, covariance of x three comma x three is nothing but areas of extreme and the off diagonal elements and Nothing but the covariance between the different features or variables.

Similarly, for D dimensional data we shall have a D plus D covariance matrix. Now properties of covariance matrix covariance matrix is a symmetric matrix, because covariance between Xi comma x ship is nothing but covariance between exit comma exit okay and covariance matrix is positive semi definite that is all is a key values are non negative. This can be proved by linear algebra, which we are skipping over here. Now, principal component analysis let's talk about the intuition of principal component analysis. First, consider the data points as shown in the figure. This is basically split inside a two dimensional plane.

Now, let's look at the data set from a different perspective. Now, all the data points now shall be projected into the tilted orthogonal axes noted by u one and what is the uniqueness about a one ecosystem of coordinates as compared to the other You know x one comma x two system of coordinates, the variance of the data points is maximum along u one and the X is u one contains the maximum information about the data set, this is called the first principal component of the data set. Similarly, you two, which contains relatively less information is known as second principal component. Hence, the data set can be represented by only one feature that is human without much lots of information. This is how PCA acts as a dimension reduction technique. Note that the principal components are linear combinations of original bases of the data set.

His PCA is a linear feature extraction technique, while standardization or normalization is required for PCA. That means, before performing PCA we have to normalize the data. Now, why is this required? Usually the data set contains variables or features which are in different units of measurements. For example, we of a person measured in kg and the sugar level in milligram per deciliter. Now, if the unit of weight is changed to ground, then the weight column would now content values which is simply thousand times the weights in kg.

Now, we know that variance of some scalar multiply to the random variable x is nothing but that scalar squared times the variance of X Okay, here he is the scalar So, variance of E x is nothing but a square variance of x. Hence, changing the scale of measurement will change the variance significantly. However, the interrelationship between the variables will not change does few variables whose values are high in range will now dominate principal component and give the misleading directions of principal components. Hence, it is always advisable to perform normalization on the data set before applying PCA normalization of standardization we make sure showed that all the data falls in the same scale. Now steps for standardizing the data. So, consider this is our D dimensional data click New Victor is equals to new one comma move up to new D denotes the mean of the columns.

That means, new j is nothing but expectation of exchange data from one to D. Similarly, sigma kilos the standard deviation of the columns sigma filters are nothing but sigma one sigma two sigma d, here also sigma is nothing but square root of the headings of the column C. Now, we can make the data sets zero centered that means, we can replace the column by subtracting all the elements in the column with their mean okay, make each column unit variance This is done after the first tape that means the Judo center instead. That way in this case, we will divide the entire data set by its standard deviation, okay. Now, the variance of each column of the transformed data is one. So, after normalization the covariance matrix of the normalization data or the normalized data is simply nothing but the correlation matrix variance probe, let us assume that we have a zero set of data that means, expectation of X vector is nothing but zero let u is the unit vector along the direction of principal component and as he is a unit vector that means, its norm should be close to one or norm squared should also be close to one and Oscar is nothing but the dot product between you with you that is used transpose you should be close to one the projection of the exit on you return is a it is obtained as he is nothing but x transpose u transpose x.

Now let's see few statistical properties of a expected value. Have is nothing but expected value of u transpose x which is nothing but u transpose expected value of x because u is fixed in space with respect to x and as expected value of X equals to zero it is nothing but zero. Now, variance of A is expected value of a squared minus expected value of a whole square. Now, as expected value of E is equal to zero, so, it is simply expected value of a square. So, here is of E is nothing but the expected value of a dot A now, the first e I am writing as you transpose x while the second am writing as x transpose u okay. So, again you transpose and you are basically constant over space as compared to x.

So, we can take out of these are two the expectation equation and what remains is that the variance of Is you transpose expectation of x x transpose u okay. So, now expectation of x x transpose is nothing but sigma which is nothing but our covariance matrix that varies is a function of u okay let's call it shine you shine U is nothing but you transpose sigma you okay. This is known as variance scope or PCA as optimization problem. Our objective is to find a unit fitted you along which the variance of the data set is maximum as the various flow shy you estimates the variance of the data set along the direction of the EU. We can reformulate the PC as the following optimization problem, find the vector u, which will maximize these various probe shy you subjected to the constraint that u is a unit vector.

So, we have to maximize u transpose sigma u subject to the constraint that you transpose u is equal to one, this is a constant optimization problem and we shall use languages multiplied with thought to solve this. So, we shall make a lateral engine like this and we have to maximize the modified objective function for this we shall differentiate the left engine with respect to you has set it to close to zero. Now, as we have set it equal to zero, we can differentiate this whole expression and from some help of the standard formula of matrix calculus, we arrive at that the final expression is sigma u equals two lambda u does principal component analysis simply reduced to an eigenvalue problem? Here we have to find the eigenvalues and the corresponding Eigen vectors of covariance matrix of the normalized data set. So, we can find the median values or corresponding Eigen vectors as the dimension of the data set is D We dad is a convolution descending orders.

Let the inflows in descending order, sir lambda one, lambda two lambda two lambda d and the corresponding Eigen vectors are u one u two u three up to UT okay as part of auditing, we can see that the vector u one is known as the first principal component of the data set. Similarly, u two points was the second principal component of the data set. So, lambda j is the various along the Jade principal component and one interesting property of the eigenvectors are the principal component is that one unit or the principal component is orthogonal to another okay. So, the proof is written over here you can go through this proof and see that you i transpose u j is equal to zero that means the dot product between two principal components is close to zero. That means principal components are orthogonal to each other. Now, scree plot scree plot is a line plot of a game Palouse of correlation matrix in descending order.

So, this is a script Have some data set usually around the first principal component the datasheet shows maximum variance scree plot is a monotonically decreasing plot and helps us to return mine how many principal components are enough to faithfully represent the data without much loss of information. So these are the properties of script doc. So his usual scree plot, one can identify the best read KD presentation of the data set. Okay, so in the next video, we will discuss about how to implement principal component analysis in Python using this cannon library. So see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.