Hello, everyone. Welcome to the course of machine learning with Python. In this video, we shall learn how to implement decision tree classifier in Python. So we shall use it as data set. So first we'll be importing the necessary libraries. So from SK learn dot data sets, we'll be importing load underscore IDs, we'll be importing NumPy as NP, we'll be importing random.
And we'll be importing matplotlib.pi plot as PLT. So let's go ahead and run this particular cell. Now. We shall be importing the data sets into the matrix X and the vector y. So x is basically the feature matrix and y is the label vector. So x comma y will be equals to load underscore IDs, return underscore capital X underscore small y is equals to true.
So here is the feature set and why is that so let's go ahead and run this particular So, now we shall be dividing the data into train and test. So, here we shall take 75% data as training sample and 25% data as test sample. So, from SK learn dot model selection, we should be important train test split. So, extreme comma x test comma y 10 comma y test will be equals to create tests pleat, x comma y and here we have specified test size is equal to 0.25. That means 25% of the data set will be randomly picked up and will be considered as a test sample. So, let's go ahead and run this particular cell.
Now, the shape of the training data set is 112 comma four that means it contains hundred and 12 training samples and four features or four attributes. Similarly, x test dot shape will give us 38 test samples Each test sample contains four features or attributes. Now, we shall be importing the model and feed the data set into the model. So, from a schema dot three, we shall be importing decision tree classifier, please note the camel font d capital T capital and C capital of the decision tree classifier. So, we will be importing this class CLF is the opposite of the decision tree classifier and we'll be fitting x train and y train. Okay.
So, there are so many attributes of this decision tree classifier. So let's look at the important one. So criteria by default is Gini. However, it can mention criterion equals to entropy or criterion equals to some other index in order to specify anything other than ci but by default, the criteria of speed is decided by Gini index we can also specify maximum depth maximum features maximum leaf nodes etc okay. So, there are so many attributes that we can specify over here however, we are choosing criterion Gini which is by default the attribute of the decision tree classifier object okay. So, now we shall be testing the model.
So why underscore predict is equals to this CLF dot predict exist. Now, we shall be evaluating the models so from SK learn dot matrix, we'll be importing confusion matrix. So our confusion matrix here is equals to confusion matrix. The test data set are the actual or the ground truth values which is why underscore test comma y underscore credit which is our predicted values. So, this is our confusion matrix. So, correct is equals to NP dot Chris of the confusion matrix that means it is the sum of the diagonal elements of the confusion matrix and the total will be nothing but the total number of elements in the validation matrix.
So, correctly identified is nothing but the credits and the total is nothing but this total value of this entire confusion matrix. So, here the correctly identified 38 and total is also 38. So, all the test samples has been correctly identified. So, what will be the accuracy, so, surely the accuracy will be hundred percent. Now from SK learn dot matrix we'll be importing classification from the school report. And we shall be printing this classification underscore report which is a function which accepts the argument whitest and white predict.
So, let's go ahead and run this particular cell. So we can see that the appreciation of class zero is one the call is one and If one score is one, similarly the precision of class one and two is all one. And accuracy is also very high, which is one. Now we can visualize the decision tree. And for these we need to install graph trees. The complete documentation on how to install graph fees can be found here.
If we click on this particular link, you will be redirected to this installation site of graph fees. So the comment is PIP installed lovelies. If you open Anaconda prompt and type the install graph with the graphics will be instructed. So after the installation of the Gulf ease, you can import graphviz and from SK learn dot three you can import export underscore breakfast. So let's go ahead and do this imports. Now from ipython dot display input display and this display function accepts these graphics cards.
Source within bracket, export route fees within bracket our model CLS. So let's go ahead and run this particular cell and as you can see, this is our decision tree created okay as you can see here all the splitting criteria are changing okay. So g equals to zero means it is a pure or homogeneous node okay. So this is how the entire decision tree has been built. So it has a very nice interpretability So, x within bracket three means it is nothing but the fourth feature, since the value of the fourth feature is less than point eight. Okay we'll be checking on this criteria it could be either true or false.
If it is true then We'll be going with Class Zero. If it is false then again we check on the feature value for whether it is less than or equals to 1.75 or not. So, that is how the entire decision tree has been built. Okay. So, I will recommend to change the criteria to entropy and see how the decision tree behaves. So, in the next video we shall explore another classifier known as random forest classifier.
So, see you in the next lecture. Thank you