kNN classifier

Machine Learning Using Python Other classification Algorithms
7 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€65.14
List Price:  €93.07
You save:  €27.92
£55.73
List Price:  £79.62
You save:  £23.88
CA$95.61
List Price:  CA$136.60
You save:  CA$40.98
A$106.30
List Price:  A$151.87
You save:  A$45.56
S$94.64
List Price:  S$135.20
You save:  S$40.56
HK$546.91
List Price:  HK$781.33
You save:  HK$234.42
CHF 63.50
List Price:  CHF 90.72
You save:  CHF 27.21
NOK kr764.69
List Price:  NOK kr1,092.46
You save:  NOK kr327.77
DKK kr485.92
List Price:  DKK kr694.20
You save:  DKK kr208.28
NZ$117
List Price:  NZ$167.15
You save:  NZ$50.15
د.إ257.06
List Price:  د.إ367.25
You save:  د.إ110.18
৳7,661.98
List Price:  ৳10,946.16
You save:  ৳3,284.17
₹5,839.65
List Price:  ₹8,342.71
You save:  ₹2,503.06
RM331.75
List Price:  RM473.95
You save:  RM142.20
₦86,437.65
List Price:  ₦123,487.65
You save:  ₦37,050
₨19,492.21
List Price:  ₨27,847.21
You save:  ₨8,355
฿2,575.56
List Price:  ฿3,679.53
You save:  ฿1,103.97
₺2,262.43
List Price:  ₺3,232.18
You save:  ₺969.75
B$357.76
List Price:  B$511.10
You save:  B$153.34
R1,296.01
List Price:  R1,851.52
You save:  R555.51
Лв127.38
List Price:  Лв181.98
You save:  Лв54.60
₩95,113.23
List Price:  ₩135,881.87
You save:  ₩40,768.63
₪260.11
List Price:  ₪371.60
You save:  ₪111.49
₱3,999.61
List Price:  ₱5,713.97
You save:  ₱1,714.36
¥10,715.43
List Price:  ¥15,308.41
You save:  ¥4,592.98
MX$1,185.45
List Price:  MX$1,693.57
You save:  MX$508.12
QR254.79
List Price:  QR364.01
You save:  QR109.21
P955.69
List Price:  P1,365.33
You save:  P409.64
KSh9,427.65
List Price:  KSh13,468.65
You save:  KSh4,041
E£3,355.67
List Price:  E£4,794.02
You save:  E£1,438.35
ብር3,989.43
List Price:  ብር5,699.43
You save:  ብር1,710
Kz58,616.62
List Price:  Kz83,741.62
You save:  Kz25,125
CLP$66,326.02
List Price:  CLP$94,755.52
You save:  CLP$28,429.50
CN¥506.51
List Price:  CN¥723.62
You save:  CN¥217.11
RD$4,049.59
List Price:  RD$5,785.38
You save:  RD$1,735.78
DA9,420.19
List Price:  DA13,457.99
You save:  DA4,037.80
FJ$157.70
List Price:  FJ$225.30
You save:  FJ$67.59
Q542.62
List Price:  Q775.21
You save:  Q232.58
GY$14,613.08
List Price:  GY$20,876.73
You save:  GY$6,263.64
ISK kr9,792.30
List Price:  ISK kr13,989.60
You save:  ISK kr4,197.30
DH706.05
List Price:  DH1,008.69
You save:  DH302.63
L1,239.86
List Price:  L1,771.31
You save:  L531.44
ден4,010.92
List Price:  ден5,730.13
You save:  ден1,719.21
MOP$562.15
List Price:  MOP$803.11
You save:  MOP$240.95
N$1,302.54
List Price:  N$1,860.85
You save:  N$558.31
C$2,571.43
List Price:  C$3,673.63
You save:  C$1,102.20
रु9,317.58
List Price:  रु13,311.40
You save:  रु3,993.82
S/262.81
List Price:  S/375.46
You save:  S/112.65
K268.53
List Price:  K383.63
You save:  K115.10
SAR262.51
List Price:  SAR375.03
You save:  SAR112.52
ZK1,879.71
List Price:  ZK2,685.42
You save:  ZK805.70
L324.19
List Price:  L463.14
You save:  L138.95
Kč1,629.65
List Price:  Kč2,328.17
You save:  Kč698.52
Ft25,373.17
List Price:  Ft36,248.95
You save:  Ft10,875.77
SEK kr758.75
List Price:  SEK kr1,083.98
You save:  SEK kr325.22
ARS$61,468.94
List Price:  ARS$87,816.53
You save:  ARS$26,347.59
Bs482.36
List Price:  Bs689.12
You save:  Bs206.75
COP$272,946.91
List Price:  COP$389,940.87
You save:  COP$116,993.96
₡35,623.88
List Price:  ₡50,893.45
You save:  ₡15,269.56
L1,732.95
List Price:  L2,475.75
You save:  L742.80
₲523,151.84
List Price:  ₲747,391.81
You save:  ₲224,239.96
$U2,683.09
List Price:  $U3,833.15
You save:  $U1,150.06
zł281.85
List Price:  zł402.67
You save:  zł120.81
Already have an account? Log In

Transcript

Hello everyone, welcome to the course of machine learning with Python. In this video, we will learn about a new classifier called k nearest neighbor classifier, it is also known as K in classifier KNN is the short form of K nearest neighbor. So, what is Euclidean distance consider the points in two dimension each point in two dimension can be represented by a vector of dimension two. So, the point u one comma u two can be represented by a vector u and a point P one comma v two can be represented by a vector v okay. So, as you can see these have to tell you and this is written v the Euclidean distance between these two point is d u comma V that is how we denote is equals to square root of u one minus v one whole square plus u two minus v two whole square.

So, it is nothing but the linear distance between this point u and v. Okay. In general, a point in n dimensional space is it presented as u is equals to u one comma u two comma u three up to us transpose So, why transpose because it is a column vector okay. So, it is an n dimensional vector hence the Euclidean distance between two points in n dimensional spaces we present it as it is the square root of one minus v one whole square plus into minus v two whole square up to even minus V and whole square okay. So, in general in vector notation the Euclidean distance is written as square root of u minus v transpose multiplied with u minus v. So, as you can see this scalar no other Distance Matrix So, Manhattan distance for two data points denoted by X and Y both are vectors, the Manhattan distance is defined as it is nothing but the sum of the absolute difference between the coordinates of x and y.

So, it is x one minus y one modulus plus modulus of x two minus y two plus modules of extra minus y three plus two models of x and minus y and so in shorthand sumption notation we can leave Write it like modulus of x i minus y we are summing this up from equals to one to n. Now Minkowski distance for two data points denoted by X and Y the Minkowski distance is defined as So, you can see H is a parameter over here, it is nothing but some of Excel minus y to the power eight items from one to n and the summation and then take into the power one by h note that for each equals to two main cause, the distance is same as the treated distance and for each equals to one it is same as the Manhattan distance Chebyshev distance for two data points in n dimensional space, it is defined as the maximum of the absolute differences of the coordinates.

Now, there are other distance between like molecules which distance but touch at a distance etc, which are used for advanced statistical pattern recognition tasks. So, what is cleaning classifier? The intuition is one is known by the company one keeps So, let's see what does it Me. So, keen algorithm starts with all the training samples and the points and available beforehand that means, we know what are the points in that training data set Okay. Now, when a new test sample and I calculate its distance from all the training points, so, whenever a new sub arise like this, we calculate the distance of this new sample from all the available training samples. Now, choose k nearest neighbor based on the distance calculated.

So, what are the nearest neighbors whose distance are minimum? Okay, so we choose let's say first three or first five or one nearest neighbor, based on the distance calculated usually is a positive odd number and is supplied by the user. Now assign the class level of the test sample based on majority that is for a test sample if most number of neighbors among those k nearest neighbors belong to one particular class C then assign the class level of the test sample as C. So, what we can see from here among these three nearest neighbors of this blue test point, who belongs to class two and one belongs to class one. So, as for majority voting, these test points should belong to class two. So, as you can see this this point now labeled as plus two now, what are the characteristics of K nearest neighbor classifier it does not create model based on the training patterns in advance rather when it when a test instance comes for testing, it runs the algorithm to get the class prediction of the particular testing instance.

Hence, there is no learning in advance hence scan and classified is also known as lazy learner. So, this discuss about the decision boundary created by k nearest neighbor classifier. So, boundary are the points those are equidistant between the points of class one and class two. So, construct the lines between the closest pairs of points individually classes as shown over here draw the perpendicular bisector in bisector at the intersections note that locally the boundary is linear. Hence, the decision boundary is piecewise linear curve for a nearest neighbor classifier for multi class classification also, the same thing is done to find the decision boundary. As you can see over here there are three classes and the decision boundary looks like a piecewise linear curve.

Now on choosing the value of k in K nearest neighbor classifier, increasing the K simplifies the decision boundary because majority voting implies less emphasis on individual points. Let's see, for k equals to one, you can see this is how it looks like the decision boundary. You can see it is very complicated decision boundary for k equals to three however, the decision boundary is complicated but not that like k equals to 140 equals to five we're getting simpler decision boundary. For k equals to seven in simpler decision boundary, so, if we keep on increasing the value of k, we will see simpler and simpler decision boundaries. So however increasing the value of k also increases computational cost. Hence, choosing k is an optimization between how much simplified decision boundary versus how much computational cost we can hear usually k equals 2579 or 11 works fine for most of the practical problems, what are the beats of K nearest neighbor classifier?

K nearest neighbor classifier often works very well for practical problems. It is very easy to implement as there is no complex learning algorithm involved. And it is also robust to noisy data demerits. So, there are so many demerits of K nearest neighbor classifier, choosing the value of k may not be straightforward. Often the same training samples are used for different values of k and we choose the most suitable value of k based on minimum misclassification. errors on disk samples it does not work well for categorical attributes it can encounter problem with sparse training data, that is data points are located far away from each other, it can encounter a problem in very high dimensional spaces.

So, what are the problems in very high dimensional spaces, most points are at the corners, and most points are at the edge of the space. This problem is known as curse of dimensionality and affect many other machine learning algorithms. So, so far, this is our discussion on k nearest neighbor classifier. In the next video, we shall implement the K nearest neighbor classifier in Python from scratch. So, see you in the next lecture. Thank you.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.