Hello everyone. Welcome to the course of machine learning with Python. In this video, we shall learn how to implement random forest classification in Python. We shall use breast cancer data set. Now we'll be importing necessary libraries from a scalar data sets input load underscore breast underscore cancer import NumPy as NP not loading the data set. So breast underscore cancer is equals to load underscore data underscore cancer within parentheses.
So inside this based at a score cancer video, all the data which is inside the function load underscore based underscore cancer shall be loaded. Now we'll be exporting the data set. So based underscore cancer dot target underscore names so it will print all the target name so we have only two classes one is selected and another one is benign. Then we shall be printing the decision scription of the breast cancer data set so here dot d s here implies description. So we can see that there are a total 569 instances and there are total 30 numeric attributes or features. So activity formations are also available.
The classes are w dBc malignant or wt bc benign. So it is a binary classification problem. Okay So next we'll be loading our feature set x and the target variable y. So placed underscore cancer data stores all the feature values and reached underscore cancer to Target stores all the target files. Now these features set x dot shape gives us 569 comma 30 because they record all 569 instances and 30 attributes now We'll be splitting our data set into training and test data. Here we shall use 75% data for training and 25% data for testing.
Now from SK learn dot model underscore selection will be importing Korean underscore test underscore speed. And instead, the train underscore test underscore split function will be parsing the entire data set x and y along with the test size equals 2.25. And we'll be receiving extreme x test y train and voltage as output. So what is the shape of extreme it is 426 comma 30. So there are 426 instances inside the training data set. And there are total 143 instances inside the test data set.
Now we'll be creating the model. So from SK learn dot ensemble, import random forest classifier. Note the camels want our capital S capital and secret capital Turn off the random forest classifier. So, this is a class no CLA is the instance of the class or the object and we have specified number of estimators It was 200. So, number of estimators are nothing but total number of trees inside the forest. So, we have specified there are total hundred trees inside the forest.
Now, we shall be fitting the training data set inside the random forest classifier. So, here you can see number of estimators because 200 and the criteria is gene. Now, we shall be predicting by the model. So, CLF dot predict is the methought. So, we'll be passing the entire test data set and we'll be receiving the predicted output. Now we'll be evaluating the model performance so from SK learn dot matrix, we'll be importing confusion matrix.
So cn is our confusion matrix which takes two arguments why taste and why credit so this is confusion matrix, the number of correctly identified samples is equals to NP dot trace of this confusion matrix that means some of the elements of this confusion matrix and total the other NP dot some of the confusion matrix that means, all the elements of the confusion matrix sum of all the elements of the confusion matrix. So, there are total 140 number of samples correctly identified out of 143 test samples. So, what is the accuracy that Missy is 97.9% now from SK learn dot matrix we'll be importing classification report which is a function which takes two argument whitest and YPD. And let's see what it is. So, it will give Tisha decollete a score of Class Zero and class one that is malignant and benign. So, that is how we have implemented random What is classified in Python?
In the next video we shall introduce a new module on dimensionality reduction. So see you in the next lecture. Thank you.