Hello, everyone, welcome to the course of machine learning with Python. In this video, we should continue our discussion on support vector machine. So, we first start our discussion on soft margin support vector machine. So, we have seen hard matches of affected machine in the last lecture. So, let's see what is called soft magic is fear. So, Soft Machine is fear is suitable for non ideal cases, where the data points are not completely separable.
So, let us consider this example. As you can see, few of the data points are moving here and there that means this particular data point if you look at it is more into the class one rather than in class two. Similarly, this data point is more into the class two rather than class one. So some data points are in the wrong side of the margin. As you can see over here, the standard approach is to allow the decision margin to make a few mistakes. So we then feed a cost for each misclassified example which depends on how far it is From the meeting the margin requirement, so to facilitate this we introduce something called slack variable, which is right.
For each data point explain Victor, a nonzero value of the slack variable Zhi allows Excel Victor to not meet the margin requirement at a cost proportional to the value of AI. Here's our modified objective function is find the value of W vector v and gi greater than equals to zero such that half of W transpose w. So, this was our initial cost plus some additional costs into misclassification which is nothing but c times sum over all gize ions from one to M, this cost function overall cost function should be minimized and our constant should be same as ever Only thing is, now it will be one minus V die. So why I multiplied W transpose X i vector plus b? Should we read an equals four One minus j and for all I okay. So, as you can see C here is the controlling parameter It is usually user defined small value of c allows large gyrus thereby allowing more excite vectors to slip through the margin and large c forces modular eyes they are one element less excited to skip to the marching okay now, what is called non linear SVM that is fee and we have learned So, far works great for linearly separable data set hence, this is also called linear support vector machine, but what we are going to do when the data set is too hard as you can see this blue points belongs to one class and red points belongs to another class, there cannot be only one hyperplane or one straight line that can separate out these two classes of data okay.
So, how about mapping this data expected to a higher dimensional space which is denoted by phi of x stricter okay. So here we will project all these data points into some higher dimensional space let's say some quadratic space like that the data set may become linearly separable in higher dimensional space. So as you can see, now the data we can pass a straight line like this and it will become linearly separable, okay? And it's linearly separable data can be easily classified by support vector machine. So, the general idea is the origins feature space can always be mapped to some higher dimensional feature space, where the training set is almost linearly separable. So we can see an example like this.
So this is basically two dimensional data. If we project these two three dimension, okay, we can pass a plane like this in order to separate these two classes of data fine. So nonlinear SVM Carmel trick common function transforms the data points from lower dimensional space to higher dimensional space. There are different kinds of kernels, fewer They are mentioned below linear kernel which will be nothing but, so, linear collision of two vectors x one and x two will be nothing but x one transpose x two polynomial of degree D kernel which is noted by case of x d x one comma x two is nothing but x one transpose x two plus one four to the power d radial pieces or Gaussian kernel procedure by k gamma of x one comma x two will be equals to exponential that is equally power minus gamma of x one minus x two norm whole square sigmoid kernel which is noted by t comma V x one comma x two It is nothing but 10 H e times x one transpose x two plus b note that A or B are scalars.
So, there are different other kernels user can define an oval kernel based on the requirement however, affirmation kernels are suitable for almost all kinds of problems and hence mostly used So, what are the merits of support vector machine Competition is a strong classifier, which outperforms other classifiers in many classification problems mathematically sound that means a nice optimization problem, which is guaranteed to converge to a single global Optima, it can work on very high dimensional feature space. As complexity does not depend on dimensionality of the feature space, it can work on non linearly separable cases using suitable kernel function demands tuning support vector machines limits this intuitive that is selecting a specific kernel function and parameters is usually done in a try and see manner. So, so far this is our discussion on support vector machine. In the next video, we shall learn how to implement support vector machine in the scalar library of Python.
Thank you for your attention. See you in the next lecture.