Dataset Preprocessing Function

Practical Deep Learning: Image Search Engine Dataset Preprocessing and Helper Functions
6 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$69.99
List Price:  $99.99
You save:  $30
€64.94
List Price:  €92.78
You save:  €27.83
£55.57
List Price:  £79.40
You save:  £23.82
CA$95.27
List Price:  CA$136.11
You save:  CA$40.83
A$107.88
List Price:  A$154.13
You save:  A$46.24
S$94.55
List Price:  S$135.08
You save:  S$40.52
HK$547.57
List Price:  HK$782.28
You save:  HK$234.71
CHF 63.42
List Price:  CHF 90.61
You save:  CHF 27.18
NOK kr759.63
List Price:  NOK kr1,085.23
You save:  NOK kr325.60
DKK kr484.38
List Price:  DKK kr692.01
You save:  DKK kr207.62
NZ$117.46
List Price:  NZ$167.80
You save:  NZ$50.34
د.إ257.02
List Price:  د.إ367.19
You save:  د.إ110.16
৳7,682.93
List Price:  ৳10,976.08
You save:  ৳3,293.15
₹5,837.41
List Price:  ₹8,339.52
You save:  ₹2,502.10
RM331.26
List Price:  RM473.25
You save:  RM141.99
₦99,285.71
List Price:  ₦141,842.81
You save:  ₦42,557.10
₨19,466.20
List Price:  ₨27,810.04
You save:  ₨8,343.84
฿2,553.28
List Price:  ฿3,647.70
You save:  ฿1,094.41
₺2,262.39
List Price:  ₺3,232.12
You save:  ₺969.73
B$349.43
List Price:  B$499.21
You save:  B$149.77
R1,335.92
List Price:  R1,908.54
You save:  R572.62
Лв126.45
List Price:  Лв180.65
You save:  Лв54.20
₩94,634.34
List Price:  ₩135,197.71
You save:  ₩40,563.37
₪258.03
List Price:  ₪368.63
You save:  ₪110.60
₱3,943.57
List Price:  ₱5,633.91
You save:  ₱1,690.34
¥10,600.67
List Price:  ¥15,144.47
You save:  ¥4,543.79
MX$1,161.53
List Price:  MX$1,659.40
You save:  MX$497.87
QR255
List Price:  QR364.31
You save:  QR109.30
P959.59
List Price:  P1,370.91
You save:  P411.31
KSh9,203.68
List Price:  KSh13,148.68
You save:  KSh3,945
E£3,310.52
List Price:  E£4,729.52
You save:  E£1,419
ብር3,976.27
List Price:  ብር5,680.63
You save:  ብር1,704.36
Kz58,526.41
List Price:  Kz83,612.74
You save:  Kz25,086.33
CLP$68,581.80
List Price:  CLP$97,978.20
You save:  CLP$29,396.40
CN¥506.04
List Price:  CN¥722.95
You save:  CN¥216.90
RD$4,144.87
List Price:  RD$5,921.50
You save:  RD$1,776.62
DA9,443.17
List Price:  DA13,490.83
You save:  DA4,047.65
FJ$158.28
List Price:  FJ$226.12
You save:  FJ$67.84
Q545.87
List Price:  Q779.86
You save:  Q233.98
GY$14,645.83
List Price:  GY$20,923.51
You save:  GY$6,277.68
ISK kr9,762.20
List Price:  ISK kr13,946.60
You save:  ISK kr4,184.40
DH709.20
List Price:  DH1,013.19
You save:  DH303.99
L1,234.28
List Price:  L1,763.34
You save:  L529.05
ден3,991.30
List Price:  ден5,702.11
You save:  ден1,710.80
MOP$564.10
List Price:  MOP$805.89
You save:  MOP$241.79
N$1,325.35
List Price:  N$1,893.44
You save:  N$568.09
C$2,576.69
List Price:  C$3,681.15
You save:  C$1,104.45
रु9,334.54
List Price:  रु13,335.63
You save:  रु4,001.09
S/259.58
List Price:  S/370.84
You save:  S/111.26
K267.89
List Price:  K382.72
You save:  K114.82
SAR262.49
List Price:  SAR375
You save:  SAR112.51
ZK1,765.85
List Price:  ZK2,522.76
You save:  ZK756.90
L322.98
List Price:  L461.43
You save:  L138.44
Kč1,645.46
List Price:  Kč2,350.75
You save:  Kč705.29
Ft25,709.21
List Price:  Ft36,729.02
You save:  Ft11,019.81
SEK kr749.87
List Price:  SEK kr1,071.30
You save:  SEK kr321.42
ARS$60,034.20
List Price:  ARS$85,766.82
You save:  ARS$25,732.62
Bs483.70
List Price:  Bs691.04
You save:  Bs207.33
COP$271,296.95
List Price:  COP$387,583.68
You save:  COP$116,286.73
₡35,581.11
List Price:  ₡50,832.34
You save:  ₡15,251.22
L1,728.07
List Price:  L2,468.78
You save:  L740.71
₲516,441.87
List Price:  ₲737,805.73
You save:  ₲221,363.85
$U2,647.22
List Price:  $U3,781.90
You save:  $U1,134.68
zł280.50
List Price:  zł400.73
You save:  zł120.23
Already have an account? Log In

Transcript

Hello, everyone. For now we have a way to load images from our disk, which is by using our image loader function. Let's make use of it in this video by implementing a date set pre processing function. This will load all images and the respective labels at once. As we discussed in the previous video, I've already prepared the head of the function. So now we will discuss the arguments for it and start our implementation.

This function takes four arguments. The first one is data set path, which is the path to the train or the test folder. If we take a look at the data set directory, it has two sub folders, train and test and all images are located inside. We have one more file inside the data set directory called labels. If we take a look at our data set pre processing function, the second argument is the path to this file, then we have our image size argument, which is the width and height of a single image, same as we had previously in the image loader function. The last part of this function that we're implementing is to save all paths of every single image of our training or testing data set.

And this will help us later on when we need to locate the most similar images and load them. Because of this, we have the fourth argument, image paths people, simply put, this argument is the name of a pickle file, where all paths will be stored. Okay, now that we are done with our arguments, let's implement the function. First thing that we have to do is to get all classes that we have in our data set. For that we have to open our labels file, right with open and provide a path to the labels file and open it in the reading mode. Now, we have to create Least of all classes in this file, open it with the read method and splitting on each newline character.

This will create a list where each element is a line of its own. or in our case, each line is a class name. Let's bring it out and check the results. As you can see, all classes are here, but we got one bonus class an empty one. To fix this, we'll put column minus one here, which will take all elements of our list except the last one. And now we have all our classes.

The next thing that we will do is to define empty lists, in which will attend images, labels and image packs. images will hold all the images, labels will hold all the respective classes for each Image and Image paths will hold the path to each other And this list will be stored in a pickle file. Let's fill out the return statement of the function by returning empty array of images, an empty array of labels. The reason that we are using empty array instead of regular lists, because we will need to use some NumPy functions for fast array manipulation, and we wouldn't be able to do that with regular Python lists. Right before the return statement, however, we will assert that Len of images is equal to land on labels. If you want to make sure that one part of the code is correct or some checker in this case, to make sure that we have the same amount of elements in images as well in labels, at least, we use the assert.

If this statement is not correct, the rest of the code won't be executed. Okay, back to the body of the function will open a for loop and go for all images in our data set folder. We'll make a try catch statement here, and just simply pass if we encounter an error in the try block, we will append images to our list of images. And we will utilize our image loader function for this purpose. It takes image path and image size as arguments. Image path can be built using oise dot path dot join, where we join the data set path and image name.

We pass image path as an argument to image loader as well as image size, which is also an argument of the function that we are building right now. Next up, we append the path of the current image through our image path lists. Now we need to go through all of our classes and check if the class can be found in the image name. And as you can see, this is how the data set is billed in our case In our case, each image has a respective class name in its name. Now we need to go through all classes and check if a class name appears in the image name. We open a for loop and go through all classes.

And we check if any of the class names is found in image name. We'll write an example here, zero underscore frog dot png isn't just an example of an image found in our data set. If there is a match, we append the index of the current label, since we only need indices. The next thing we should do is to save all of these paths to our people here will open a new file called image path pickle and append dot pickle to it and open it in the binary writing. will use pickle dot dump which takes an object in our case a list of paths file which is opened, execute the cell. And let's store the results of the function inside images and labels, execute the function and it's going to take a bit to run.

So I'll get back to you when it finishes. Okay, the function is done on my end. So let's change the shape of our training set images. There we go. As we discussed, there are 50,000 images here. Each image is 32 by 32, with free color channels.

And that's it for this tutorial. If you have any questions or comments, leave them please in the comment section. Otherwise, assume the next tutorial

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.