Hello everyone, welcome to the course of machine learning with Python. In this video we shall discuss about handwritten digit recognition using artificial neural network. So first we shall study some basics of digital imaging. The digital images are nothing but discretized intensity values associated with discretize spatial coordinates. The digital images are of two types, the grayscale images and the colored images. So, usually the digital images are added in the form of pixels.
Pixels are nothing but picture elements which are averaged across the groups as shown in this figure. Each pixel values are integers lies between zero to 255. For the grayscale image, there is only one value associated with each pixel and these value lies between zero to 55. For the color images, there are three values associated with each pixel, one for rate, our Another one for green ci and the last one is for Lu v each values are integers between zero to 255. Now for grayscale images extreme black is denoted as zero, and the extreme White is denoted as 255. In the color images zero denotes the absence of the color and 255 denotes fully presence of that color, but this is the handwritten digit data set called the amnesty data set it contains 60,000 training and 10,000 test samples.
Each image is grayscale image of dimension 28 cross 28 pixels each training and taste sample also supplied with corresponding level there are 10 classes digit zero belongs to Class Zero digit one belongs to class one, and so goes on up to digit nine belongs to class nine. Next we shall discuss about data processing steps before discussing the architecture of the multi layer perceptron for handwritten digit recognition now in the past of data Pre processing we should flatten the images. So, originally the training data set is of dimension 60,000 cross 28 cross 28 as each image is of dimension to date roster date and there are total 60,000 training images now, for each image there are total 784 pixels. So, this is one image if we flatten that image it should be converted to a one dimensional array of dimension 784. So, after flattening his image will become an one dimensional array of size 74 and the training data then cell become 60,000 cross 784 dimensions similarly, if we flat in the test data set also then that 10,000 cross 28 cross 20 dimensional this data set shall be of size 10,000 cross 784 after factoring in the next tip of data pre processing we should normalize the data the digital image contains integer values between zero to 55.
First we have to convert the data type of the image From integer to float, then we have to divide the values in each pixel by 255. This will ensure that each pixel will contain the values between zero to one normalization will help to stabilize the gradient descent algorithm. Note that both training and test data have to be normalized in the next step of data processing we should one hot encode all the labels for the targets. So, for multi class classification from each class levels or target value should be converted to one hot encoded vectors each one hot encoded vector should have dimension equals to the number of classes. So, this is these are the images so, this belongs to Class Zero and after one hot encoding, it becomes a vector of dimension a, whose zero it eliminate or the first element is basically one. So, in Python terms, this is nothing but the zero element because in Python the index starts from zero from zero to nine.
So, here the class level is zero and the corresponding elevators One of these 100 encoded vector and rest of the values are zero here the image is belongs to class one, and this element of the vector is one, and the rest of all the elements are zero. Similarly, if the image here belongs to class level five, then note that 012345 fifth element from the beginning is one, and so the values are zero. Okay? Note that in Python, it is zero based indexing. So we have to start from zero. So the first element of the array is basically zero element, and so it was all so similarly, we have four other classes, we'll be doing the same one hot encoded representation, as the output of the multi layer perceptron will provide a probability distribution function.
Hence, we have to 100 encode the target variables as well. Now the architecture so it is a single hidden layer architecture. So, note that in the input we have 784 plus one neurons in the layer of the input. Now, why plus one this plus four is for the pious, then we have a dense connection which is connected to the hidden layer there is only one hidden layer and the number of units in the hidden layer is same as the input here followed by there is an output layer. There are 10 neurons in the output layer because there are 10 classes note that the hidden layer and output layer is also connected by the dense connection then comes training. So, for training we have to specify the batch size batch size is limited by the physical memory availability in the system and the size of each of the samples.
If the available memory in the system is more choose smaller batch size maybe 16 or 32. One can choose larger batch size if the available memory is more like breast size 128 or 256. Data now, number of a box number of epochs should be reasonable for less complex data set. Smaller ebooks are sufficient, however, hired talks will lead to longer computational time then comes loss function as we are using one hot encoded vector and in the output layer we are expecting to have a probability distribution. Hence, we shall use categorical cross entropy loss for cleaning of the multi layer perceptron then comes optimizer we shall be using atom optimizer optimizer ensures that while learning to be a propagation the network does not halt in local minima, there are other types of optimizers as well. Now, a few practical considerations batch normalization batch normalization helps to obtain faster convergence though it is not always required then comes initialization weights and bias should be initialized with small random numbers then check for convergence plot the loss function versus a poker if it is decreasing with acceptable rate then the network is converging.
If it does not decrease or very slowly decrease then we have to modify the nature hyper parameters until it shows convergence then comes activation functions. So they do that is rectified linear unit is the activation function in hidden units. And softmax will be the activation function in the output layer now TensorFlow and Caris. So what is TensorFlow? TensorFlow is a very popular fast scalable Machine Learning Library developed by Google tensor flow is one of the most sought after skill in physics scenarios. In the context of artificial neural network tensor flow automatically computes the gradient, hence backpropagation becomes easier for the complete documentation of tensor flow.
You can find in this website locators with TensorFlow back end tells us to do excellent in performance. It has one drawback, and that is it's complex in taxes, which makes it difficult for beginners. karass provides an easy API over TensorFlow it is comparatively much easier to use and with few lines of code, user can build and train neural network very easily With Kara's for concrete documentation and Cara's you can refer to this website. So in the next video we shall learn how to implement handwritten digit recognition in Python. So see you in the next lecture. Thank you