Implementation of k-means clustering

Machine Learning Using Python Unsupervised Learning: Clustering
8 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€59.88
List Price:  €85.55
You save:  €25.66
£51.91
List Price:  £74.16
You save:  £22.25
CA$96.55
List Price:  CA$137.94
You save:  CA$41.38
A$106.94
List Price:  A$152.78
You save:  A$45.84
S$90.06
List Price:  S$128.67
You save:  S$38.60
HK$546.11
List Price:  HK$780.19
You save:  HK$234.08
CHF 56.21
List Price:  CHF 80.30
You save:  CHF 24.09
NOK kr703
List Price:  NOK kr1,004.34
You save:  NOK kr301.33
DKK kr447.02
List Price:  DKK kr638.63
You save:  DKK kr191.60
NZ$119.09
List Price:  NZ$170.14
You save:  NZ$51.04
د.إ257.03
List Price:  د.إ367.21
You save:  د.إ110.17
৳8,520.17
List Price:  ৳12,172.20
You save:  ৳3,652.02
₹6,177.29
List Price:  ₹8,825.07
You save:  ₹2,647.78
RM295.62
List Price:  RM422.33
You save:  RM126.71
₦106,975.51
List Price:  ₦152,828.71
You save:  ₦45,853.20
₨19,868.89
List Price:  ₨28,385.35
You save:  ₨8,516.45
฿2,253.18
List Price:  ฿3,218.97
You save:  ฿965.79
₺2,887.40
List Price:  ₺4,125.04
You save:  ₺1,237.63
B$381.29
List Price:  B$544.72
You save:  B$163.43
R1,237.85
List Price:  R1,768.44
You save:  R530.58
Лв117.23
List Price:  Лв167.48
You save:  Лв50.25
₩97,299.65
List Price:  ₩139,005.46
You save:  ₩41,705.80
₪233.76
List Price:  ₪333.96
You save:  ₪100.19
₱3,982.74
List Price:  ₱5,689.88
You save:  ₱1,707.13
¥10,370.72
List Price:  ¥14,815.95
You save:  ¥4,445.23
MX$1,307.18
List Price:  MX$1,867.49
You save:  MX$560.30
QR254.81
List Price:  QR364.04
You save:  QR109.22
P1,007.37
List Price:  P1,439.16
You save:  P431.79
KSh9,042.70
List Price:  KSh12,918.70
You save:  KSh3,876
E£3,398.02
List Price:  E£4,854.53
You save:  E£1,456.50
ብር10,014.21
List Price:  ብር14,306.63
You save:  ብር4,292.42
Kz63,827.73
List Price:  Kz91,186.39
You save:  Kz27,358.65
CLP$68,047.07
List Price:  CLP$97,214.27
You save:  CLP$29,167.20
CN¥499.08
List Price:  CN¥713
You save:  CN¥213.92
RD$4,438.06
List Price:  RD$6,340.36
You save:  RD$1,902.30
DA9,089.14
List Price:  DA12,985.04
You save:  DA3,895.90
FJ$157.92
List Price:  FJ$225.61
You save:  FJ$67.69
Q537.12
List Price:  Q767.35
You save:  Q230.23
GY$14,646.12
List Price:  GY$20,923.93
You save:  GY$6,277.80
ISK kr8,575.17
List Price:  ISK kr12,250.77
You save:  ISK kr3,675.60
DH635.26
List Price:  DH907.55
You save:  DH272.29
L1,173.39
List Price:  L1,676.34
You save:  L502.95
ден3,696.82
List Price:  ден5,281.40
You save:  ден1,584.57
MOP$562.45
List Price:  MOP$803.54
You save:  MOP$241.08
N$1,241.62
List Price:  N$1,773.82
You save:  N$532.20
C$2,576.11
List Price:  C$3,680.31
You save:  C$1,104.20
रु9,866.78
List Price:  रु14,096.01
You save:  रु4,229.22
S/246.53
List Price:  S/352.21
You save:  S/105.67
K296.66
List Price:  K423.82
You save:  K127.15
SAR262.61
List Price:  SAR375.17
You save:  SAR112.56
ZK1,665.94
List Price:  ZK2,380.02
You save:  ZK714.07
L304.15
List Price:  L434.52
You save:  L130.37
Kč1,463.14
List Price:  Kč2,090.29
You save:  Kč627.14
Ft23,530.45
List Price:  Ft33,616.37
You save:  Ft10,085.92
SEK kr658.63
List Price:  SEK kr940.94
You save:  SEK kr282.31
ARS$95,361.78
List Price:  ARS$136,236.96
You save:  ARS$40,875.17
Bs483.93
List Price:  Bs691.36
You save:  Bs207.43
COP$279,578.30
List Price:  COP$399,414.69
You save:  COP$119,836.39
₡35,391.50
List Price:  ₡50,561.46
You save:  ₡15,169.95
L1,834.15
List Price:  L2,620.33
You save:  L786.18
₲504,626.01
List Price:  ₲720,925.20
You save:  ₲216,299.19
$U2,804.51
List Price:  $U4,006.62
You save:  $U1,202.10
zł254.73
List Price:  zł363.92
You save:  zł109.18
Already have an account? Log In

Transcript

Hello, everyone, welcome to the course of machine learning with Python. In this video, we shall discuss how to implement k means clustering algorithm in Python. So let's begin with importing the necessary libraries. So we have imported the matplotlib library and the NumPy library, along with that, I have imported from a scalar data set input MC underscore bluffs. So this is for generating global data points. Okay, so let's go ahead and run this particular cell.

Now, we shall generate data points for clustering. So we'll make use of this make underscore block function. So I have specified there are total 200 number of data points and approximately they are distributed in three clusters. number of features or the dimensionality of the data set is two. This is for visualization purpose. And all these classes are more or less normally distributed I have made shuffled equals to two that means the data points will be getting will be jumbled up.

Okay? Not sequential met. Okay, now let's go ahead and print the shape of the data set. So it is 200 comma two that means it contains 200 number of rows and two columns. Now we should plot the data. Okay?

So, do this is not very much imperative that there are two to three clusters over here, but let's go ahead and see how these vs you can run this particular function again and again, you know to have a whoops shape of clusters Okay. Now, we have three different clusters as you can see, so visually we can identify that this is one cluster these are the question is okay, now we should implement our K means clustering algorithm. So first, we shall identify what the arguments of this game is on the scope cluster function. So it's we are Data Set Key is number of clusters. So these are user provided and tolerance value by default it has been set as 0.001. Okay, so number of points in the data set has been identified, then we have checked whether the number of clusters are less than equal to the total number of points.

If not, we'll throw an error at the number of clusters cannot be greater than the total number of points in the data set. Okay. Now, we should first choose the key initial centroids. So from NumPy dot random dot choice function, we have created key number of initial centroids. Okay, fine. So this is basically a placeholder for labels of each data point this cluster undercoat scored labels and new underscore centers is the placeholder for new computer centers fine, so my plan was to true for it.

The range of number of points, we shall compute the distance of all the data points from the centroid and we should assign cluster level according to it, okay. So, if this a particular data point is closer to the centroid of one cluster will be assigning that point to that cluster Okay. Then we shall again we compute the cluster centroid and then we shall sit for tolerance that whether the centroid has been moved significantly or not, if not we shall make flat equals to false and we get out of this pile low okay. So, as soon as we get the convergence that means, we reach to the tolerance level we sell these our flat was to false in order to break out of the window Okay, so the sale price will be nothing but the new centroids that has been continuously getting created. Okay, after the assigning of those data points into the various clusters, so we will be returning the centroids and the cluster levels.

Okay, so let's go ahead and run this particular cell. So now we shall taste our algorithm. So number of clusters because two three if specified, so we shall call our function, we should put x as the argument. Key is nothing but number of question and tall it is, as usual 0.001. So, let's go ahead and run this particular cell and see what are the circles, okay, now we can plot the cluster so I have made the colors and the markers that it can take red, green, blue, cyan, magenta, yellow, etc color, that means we have to color code each cluster then the marker could be either circular marker or triangular marker or square, etc, etc. Okay, so here's a seven by seven figure I'll use plotting the cluster dataset, okay.

So, starting from the requital three mini clusters right. So, for j in range of number of clusters, so j can take the value 012 So, where the cluster level equals to close to zero we should plot the x value and the y value and color will be choosing from here and the markers also will be choosing from here, okay, so this is nothing but a scatter plot and after that we'll be talking the same colors as well with a marker star. Fine. And color was to sire so let's see. Okay, so this is basically our clusters data set. Now let's go with it and see what was the original data.

So this was the original data point. And this is the cluster data points. So as you can see, gaming's has performed remarkably good job in order to cluster these data points and these small starts over here are nothing but the same quotes of these three clusters okay. You can see these are nothing but separates Okay. Now, we can do this k means clustering using a scalar also because inside the scalar all these functionalities has been inbuilt. So, from a scalar cluster we have to import k means class and this key is nothing but an object of this game is class and I have specified number of cluster it was two three, okay.

Now, we shall fit the data set and thinking we can plot the clusters that does update. So, here what is y k y k will be all the clusters levels like our cluster levels over here. Okay, so when y came it was too close to zero we'll be plotting the cluster with the cyan color and marker should be square similarly for the next cluster color In green and the market is nothing but the circular talk and for the next cluster, the color is blue and the market is the web Crangle let's go ahead and run this particular cell okay. So, it has also done a very good job in order to cluster this data set right. So, our algorithm and this algorithm is more or less similar. Now, if we print the cluster centers, we obtained by this process, we can do it by this.

So let's go ahead analysis and see these are the basically the cluster centers we have appeared. So to compare the cluster center opted from our scratch algorithm that means the algorithms that we have built on our own, so these are the same course. Now note that this centroid has come over here and the centroid has come over here, and this import has come over here. So The sequence of the centroid does not matter Only thing is it has to match okay in any order. So, it has it is in a close race or I can say these almost exactly match, we do the same words we have obtained by our own algorithm with the algorithm that is inbuilt inside the Python library called a scalar okay. So, in the next video, we shall introduce another clustering algorithm known as hierarchical clustering algorithm.

So, see you in the next lecture. Thank you

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.