Unsupervised Learning

Machine Learning Using Python Unsupervised Learning: Clustering
8 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.39
List Price:  €93.41
You save:  €28.02
£55.92
List Price:  £79.90
You save:  £23.97
CA$96.01
List Price:  CA$137.16
You save:  CA$41.15
A$107.15
List Price:  A$153.08
You save:  A$45.93
S$95.13
List Price:  S$135.90
You save:  S$40.77
HK$547.14
List Price:  HK$781.66
You save:  HK$234.52
CHF 63.86
List Price:  CHF 91.23
You save:  CHF 27.37
NOK kr775.40
List Price:  NOK kr1,107.76
You save:  NOK kr332.36
DKK kr487.78
List Price:  DKK kr696.86
You save:  DKK kr209.07
NZ$118.01
List Price:  NZ$168.60
You save:  NZ$50.58
د.إ257.06
List Price:  د.إ367.24
You save:  د.إ110.18
৳7,680.49
List Price:  ৳10,972.60
You save:  ৳3,292.11
₹5,842.03
List Price:  ₹8,346.11
You save:  ₹2,504.08
RM332.86
List Price:  RM475.54
You save:  RM142.67
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,491.96
List Price:  ₨27,846.85
You save:  ₨8,354.89
฿2,586.09
List Price:  ฿3,694.58
You save:  ฿1,108.48
₺2,265.39
List Price:  ₺3,236.41
You save:  ₺971.02
B$363.53
List Price:  B$519.35
You save:  B$155.82
R1,302.64
List Price:  R1,861
You save:  R558.35
Лв127.90
List Price:  Лв182.73
You save:  Лв54.82
₩96,270.48
List Price:  ₩137,535.16
You save:  ₩41,264.67
₪262.29
List Price:  ₪374.71
You save:  ₪112.42
₱4,033.94
List Price:  ₱5,763.02
You save:  ₱1,729.07
¥10,867.12
List Price:  ¥15,525.12
You save:  ¥4,658
MX$1,187.12
List Price:  MX$1,695.96
You save:  MX$508.84
QR254.93
List Price:  QR364.20
You save:  QR109.27
P994.08
List Price:  P1,420.18
You save:  P426.09
KSh9,360.69
List Price:  KSh13,372.99
You save:  KSh4,012.30
E£3,358.63
List Price:  E£4,798.26
You save:  E£1,439.62
ብር4,003.77
List Price:  ብር5,719.92
You save:  ብር1,716.15
Kz58,546.63
List Price:  Kz83,641.63
You save:  Kz25,095
CLP$67,216.99
List Price:  CLP$96,028.39
You save:  CLP$28,811.40
CN¥506.70
List Price:  CN¥723.89
You save:  CN¥217.19
RD$4,073.53
List Price:  RD$5,819.58
You save:  RD$1,746.04
DA9,418.34
List Price:  DA13,455.35
You save:  DA4,037.01
FJ$158.31
List Price:  FJ$226.17
You save:  FJ$67.86
Q543.96
List Price:  Q777.12
You save:  Q233.16
GY$14,650.29
List Price:  GY$20,929.88
You save:  GY$6,279.59
ISK kr9,815.39
List Price:  ISK kr14,022.59
You save:  ISK kr4,207.20
DH707.71
List Price:  DH1,011.06
You save:  DH303.35
L1,237.78
List Price:  L1,768.33
You save:  L530.55
ден4,025.24
List Price:  ден5,750.59
You save:  ден1,725.35
MOP$563.96
List Price:  MOP$805.69
You save:  MOP$241.73
N$1,304.33
List Price:  N$1,863.42
You save:  N$559.08
C$2,570.38
List Price:  C$3,672.13
You save:  C$1,101.75
रु9,397.27
List Price:  रु13,425.24
You save:  रु4,027.97
S/263.43
List Price:  S/376.35
You save:  S/112.91
K270.11
List Price:  K385.89
You save:  K115.77
SAR262.49
List Price:  SAR375.01
You save:  SAR112.51
ZK1,873.89
List Price:  ZK2,677.10
You save:  ZK803.21
L325.37
List Price:  L464.84
You save:  L139.46
Kč1,643.47
List Price:  Kč2,347.91
You save:  Kč704.44
Ft25,458.03
List Price:  Ft36,370.18
You save:  Ft10,912.14
SEK kr764.90
List Price:  SEK kr1,092.76
You save:  SEK kr327.86
ARS$61,327.27
List Price:  ARS$87,614.14
You save:  ARS$26,286.87
Bs483.57
List Price:  Bs690.85
You save:  Bs207.27
COP$273,218.78
List Price:  COP$390,329.27
You save:  COP$117,110.49
₡35,710.66
List Price:  ₡51,017.42
You save:  ₡15,306.75
L1,733.65
List Price:  L2,476.75
You save:  L743.09
₲524,442.73
List Price:  ₲749,236.02
You save:  ₲224,793.28
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł283.24
List Price:  zł404.64
You save:  zł121.40
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. In this video, we shall learn about unsupervised learning. So, what is unsupervised learning? The data has no target attribute or class levels, we want to exclude the data to find some intrinsic structures in them. Usually, the objectives or the data are grouped into two or more groups based on the similarity or dissimilarity on a particular feature, it can produce completely different results based on the feature being used for grouping. So grouping of objects into two or more groups based on similarities or dissimilarities of objects, such as each object falls into exactly one group is called question how similar or dissimilar are following two objects.

So, the definition of similarity is the quality or state of being similar likeness resemblance as a similarity of So, this is a kind of dictionary definition. So, similar It is hard to define, but we know when we see it. So, the real meaning of similarity is a philosophical questions, but here we will take a more pragmatic approach. So, similarity, this is the numerical measure of how similar two data objects are it is higher when two objects are more alike the similarity is the numerical measure of how different two data or objects are, it is lower when objects are more alike. So, between two data objects if similarity increases, then the similarity decreases and vice versa. So, they somewhat poses some kind of inverse relationship.

So, usually the similarity between two objects or data points can be defined in terms of distance between two objects. So, for these two objects, they are more dissimilar and if they are less distant then they are more similar okay. So, there are different distance to measure similarity and dissimilarity There are different distance metrics for both quantitative and categorical variables the definition of distance function or distance metric is following. So, let x and y vector denotes two different options then this within bracket x comma y is a real number such that distance between x and x should be zero. distance between x and y should be close to distance between y and x this is called the commutative property for some other object denoted by Zed distance between x comma y should be less than equals to distance between x comma z and distance between plus distance between Zed comma y.

So this is what we call triangle inequality. So let's say we take these two objects if we calculate the distance, by some method, we may get point two three as they are more similar. Similarly here, we are getting a distance of three. This is called minimum edit distance is how Many ad we must do in each word to convert it to the another word and let's say these are two fingerprints if we compute the distance by some means, we may get some larger value because these two fingerprints are dissimilar. So, here are some distance matrixes that we use for similarity or dissimilarity measure. So, the first one we have already discussed which is called critical distance.

So for two data points noted by instructed vibrator Euclidean distance is defined as distance suffix Euclidean x comma y where x and y are both is equals to square root of x minus y transpose multiplied with x return minus y okay. Similarly, we have already discussed what is called Manhattan distance. So for two data points routed by inserted and lighted the Manhattan distance is defined as these suffix Manhattan x comma y is equals to is from one to N within bracket or within. So, this is basically models x minus y. So, we are taking the absolute difference of each coordinates we sum them up and this is called the Manhattan distance. So what is Minkowski distance so Minkowski distance between two points x vector and y vector is defined as, so we take x minus y equal to the power h items from one to n, that means, we are summing up all these power difference of the coordinates and then the sum is a power to the power one by h. So, for each equals to two, it is similar as the Euclidean distance for G goes to one, it is similar as the Manhattan distance now cosine similarity two for two data points don't it by the X Factor and why did the cosine similarity is defined as the cosine between the angle of x vector and y vector So, this theta is the angle between x and y vector.

So the cosine of these Tita will denote the cosine similarity between the x vector and y vector This is not a distance measure, this is a kind of similarity measure. So, if theta is equal to 90 degrees that means its vector is perpendicular to y vector and if you take close to zero that means they are alike okay. So, cosine similarity lies between plus one to minus one. So, minus one means the two vectors are completely wanting to do the opposite and cosine similarity zero means they are perpendicular and cosine similarity to one means they are alike okay. And that is how we calculate the cosine similarity. So, it is X transpose Y divided by modulus of x and model as a white as you can see is Victor cross what is nothing but the dot product between is Victor and white.

Now, clustering analysis So, what is clustering analysis So, find the groups of objects in data such that the objects in a group are similar or related to one another or before it from or unrelated to the objects in other books so far definition is let x be a data set a collection of subsets c one c two up to ck says that each of these ci is a subset of the original data set x and none of them are now for all it is called a clustering if ci intersection CJ that means equals two now, that means, there is no common point between two cluster So, each point should belong to one and only one cluster. And the next one is if we take the union of all those clusters, we'll get back our original data set and what are the conditions should be followed such that the inter cluster similarity is maximized and increased cluster dissimilarities maximized or intra cluster similarity is minimized.

So, what is inter cluster So, within the cluster is known as intra cluster and between the clusters is known as inter cluster. So, between the cluster distance should be maximized but within the cluster distance should be minimized. So, what is the natural grouping among following options? Let's see an example of the cluster. So, one clan form grouping like the Simpsons family and the school stuffs. Some other may go for this kind of question females and males, some other may go for this kind of clustering, these are the group of adults and the group of children.

So, what we have learned from here, clustering is subjected to the choice of features. So, here the features could be adults or children, here the features could be male or female, here the features could be Simpsons family or school stuff. So, based on the choice of features, we can obtain different different clusters. Now, applications of clustering image segmentation, so, we can break up the image into meaningful or perceptually similar regions. So, this is the original image and this is the segmented image similarly, this is the original image and this is the segmented image social network analysis in the study of social networks clustering may be used The governance communities within large groups of people medical imaging on PT scans cluster analysis can be used to differentiate between different types of tissue and blood in a three dimensional image. And there are many more applications of clustering or different types of clustering.

There are several kinds of clustering algorithms partitioning clustering among the K means k me dogs, k medians algorithms are very popular hierarchical clustering abnormality or divisive quite of hierarchical clustering, then density based clustering, db scan and optics are algorithms that is sponsored on density based clustering, but there are many more and fuzzy clustering. So, so far this is our discussion on unsupervised learning. In the next lecture, we'll see how to use k means clustering to opt in the cluster of a data set. So, see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.