Principal component analysis in python

Machine Learning Using Python Dimensionality Reduction
7 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.14
List Price:  €93.07
You save:  €27.92
£55.73
List Price:  £79.62
You save:  £23.88
CA$95.61
List Price:  CA$136.60
You save:  CA$40.98
A$106.30
List Price:  A$151.87
You save:  A$45.56
S$94.64
List Price:  S$135.20
You save:  S$40.56
HK$546.91
List Price:  HK$781.33
You save:  HK$234.42
CHF 63.50
List Price:  CHF 90.72
You save:  CHF 27.21
NOK kr764.69
List Price:  NOK kr1,092.46
You save:  NOK kr327.77
DKK kr485.92
List Price:  DKK kr694.20
You save:  DKK kr208.28
NZ$117
List Price:  NZ$167.15
You save:  NZ$50.15
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,661.98
List Price:  ৳10,946.16
You save:  ৳3,284.17
₹5,839.65
List Price:  ₹8,342.71
You save:  ₹2,503.06
RM331.75
List Price:  RM473.95
You save:  RM142.20
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,492.21
List Price:  ₨27,847.21
You save:  ₨8,355
฿2,575.56
List Price:  ฿3,679.53
You save:  ฿1,103.97
₺2,262.43
List Price:  ₺3,232.18
You save:  ₺969.75
B$357.76
List Price:  B$511.10
You save:  B$153.34
R1,296.01
List Price:  R1,851.52
You save:  R555.51
Лв127.38
List Price:  Лв181.98
You save:  Лв54.60
₩95,113.23
List Price:  ₩135,881.87
You save:  ₩40,768.63
₪260.11
List Price:  ₪371.60
You save:  ₪111.49
₱3,999.61
List Price:  ₱5,713.97
You save:  ₱1,714.36
¥10,715.43
List Price:  ¥15,308.41
You save:  ¥4,592.98
MX$1,185.45
List Price:  MX$1,693.57
You save:  MX$508.12
QR254.79
List Price:  QR364.01
You save:  QR109.21
P955.69
List Price:  P1,365.33
You save:  P409.64
KSh9,427.65
List Price:  KSh13,468.65
You save:  KSh4,041
E£3,355.67
List Price:  E£4,794.02
You save:  E£1,438.35
ብር3,989.43
List Price:  ብር5,699.43
You save:  ብር1,710
Kz58,616.62
List Price:  Kz83,741.62
You save:  Kz25,125
CLP$66,326.02
List Price:  CLP$94,755.52
You save:  CLP$28,429.50
CN¥506.51
List Price:  CN¥723.62
You save:  CN¥217.11
RD$4,049.59
List Price:  RD$5,785.38
You save:  RD$1,735.78
DA9,420.19
List Price:  DA13,457.99
You save:  DA4,037.80
FJ$157.70
List Price:  FJ$225.30
You save:  FJ$67.59
Q542.62
List Price:  Q775.21
You save:  Q232.58
GY$14,613.08
List Price:  GY$20,876.73
You save:  GY$6,263.64
ISK kr9,792.30
List Price:  ISK kr13,989.60
You save:  ISK kr4,197.30
DH706.05
List Price:  DH1,008.69
You save:  DH302.63
L1,239.86
List Price:  L1,771.31
You save:  L531.44
ден4,010.92
List Price:  ден5,730.13
You save:  ден1,719.21
MOP$562.15
List Price:  MOP$803.11
You save:  MOP$240.95
N$1,302.54
List Price:  N$1,860.85
You save:  N$558.31
C$2,571.43
List Price:  C$3,673.63
You save:  C$1,102.20
रु9,317.58
List Price:  रु13,311.40
You save:  रु3,993.82
S/262.81
List Price:  S/375.46
You save:  S/112.65
K268.53
List Price:  K383.63
You save:  K115.10
SAR262.51
List Price:  SAR375.03
You save:  SAR112.52
ZK1,879.71
List Price:  ZK2,685.42
You save:  ZK805.70
L324.19
List Price:  L463.14
You save:  L138.95
Kč1,629.65
List Price:  Kč2,328.17
You save:  Kč698.52
Ft25,373.17
List Price:  Ft36,248.95
You save:  Ft10,875.77
SEK kr758.75
List Price:  SEK kr1,083.98
You save:  SEK kr325.22
ARS$61,468.94
List Price:  ARS$87,816.53
You save:  ARS$26,347.59
Bs482.36
List Price:  Bs689.12
You save:  Bs206.75
COP$272,946.91
List Price:  COP$389,940.87
You save:  COP$116,993.96
₡35,623.88
List Price:  ₡50,893.45
You save:  ₡15,269.56
L1,732.95
List Price:  L2,475.75
You save:  L742.80
₲523,151.84
List Price:  ₲747,391.81
You save:  ₲224,239.96
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł281.85
List Price:  zł402.67
You save:  zł120.81
Already have an account? Log In

Transcript

Hello, everyone. Welcome to the course of machine learning with Python. In this video, we shall learn how to implement principal component analysis in Python using a scalar library. So we are loading breast cancer data set available in a scalar. So we have imported NumPy as NP and from SK learn dot data sets, import load underscore breast underscore cancer. So let's go ahead and run this particular cell.

In particular while to load the data. Now, we shall take the data in the variable name based underscore cancer underscore data. So we go ahead and run this particular cell. And inside this variable based underscore cancer underscore data, there is one field called D SCR which will describe the breast cancer data set. Let's go ahead and run this cell. And we can see that this is the description of the breast cancer data set.

So total number of instances five, six In a number of attributes 30 all are numeric and they are activity information radius texture parameter sorry very bitter areas smoothness etc and there are only two classes malignant or benign okay. So, it is a classification problem which p features and 569 available data points or critical data points Okay. Now, Excel will be nothing but waste underscore concern underscore data dot data and y will be nothing but the target okay. So, if we go ahead and run the sale, extract dot shape we can see that it contains 569 rows and 30 columns while y contains only 569 rows Okay. Now, we shall standardize or normalize the data so from SK learn dot pre processing we are importing standard scaler. So, this is the class and it sees the object to the class standard scalar Then we will use feed transform function or method to transform our original data set x to another data set x underscore p which is having unit variance and zero mean okay, now we can perform PCA.

So, from SK learn dot decomposition, we are importing PCA, PCA is transfer principal component analysis. So, note that this is also a class so, we are creating an object PCA of this class specifying the number of components is equals to five. Now, note that number of components depends on user as well as on the data set. Now it can be five 315 etc etc, but in no way number of components should exceed the total number of available features that means it should not exceed 30 in this particular case, okay. Now what is this x transform would be nothing but PCA dot fit underscore transform our normalized data that is x t, okay, let's go ahead and on this particular cell and if we now clean the shape of our transform variable, we can see that it contains 569 rows, but five columns that means from 30 columns, we have projected the data into five columns okay that is from 30 dimensional space, we have predicted the data into five dimensional space okay.

Now we are using first two principal components for data visualization. Let's say expand is nothing but all the the first principal component when while that means the target variable equals to zero and y one is the second principal component when the target variable equals to zero. Similarly, x two is the value of the principal component first principal component when the target variable equals to one and y two is the value of the second principal component when the target variable equals to one. Okay, let's go ahead and run this particular cell. Now, we shall import matplotlib.pi plot as PLT Here we are using matplotlib in line magic comment, so parsing matplotlib in line will make sure that the plot stays inside this notebook. And if we use parcel matplotlib notebook, then it will be an interactive plot.

If we omit this particular comment, then it may or may not plot depending upon the version of the macro clip library we are using. Okay. So the finger size in this case is seven comma seven. So we are plotting x one comma y one with color blue with the label malignant and the marker is nothing but cycle and we are plotting x two and y two with color grade level benign and marker triangle okay we are giving x level as component one and y level as component to this go ahead and on this particular cell okay as you can see this blue dot actually denotes the malignant cases and triangle red triangle denotes the benign cases as we can see within these two components the data sets are very well separated okay we can find a decision boundary to classify the data Okay, maybe will not be hundred percent accurate but somewhat we will be close to very high accuracy with only these two components.

Okay. Now we shall do scree plot. So insert this PCA object There is one attribute called explain underscore variance underscore ratio, okay. So this is nothing but it will give us all the Eigen values in normalized form fine. So let's go ahead and run this particular cell and we can see that this is basically it was clip block. So he has a principal component starts from zero as the Python always starts the index with zero, then 01234 etc.

Zero means first principal component one means second principal component two means that principal component so, as we can see, if we increase the number of principal component, the explained variance decreased drastically, the first principal component explains almost 45% of the variance of the data set and the second principal component explain almost 70% of the variance in the data set. Okay. So, now we can feel the values of explained variance show which we have plotted over there and we can see that the values are decreasing in magnitude, okay. Now, if we run this particular cell we can see that within this five principal components at 4.73% of the variance has been explained by the first five principal components of the data set. Okay. So, so far this is our discussion on principal component analysis.

In the next video, we shall introduce our new module known as artificial neural network. So, see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.