Cosine Distance

Practical Deep Learning: Image Search Engine Dataset Preprocessing and Helper Functions
2 minutes
Share the link to this page
Copied
  Completed

Transcript

Hello everyone. In this video we are going to talk about the cost and distance function and how it will be incorporated in the image to Image Search pipeline that we are going to create. Because we want to compare image to image directly, we will create training set vectors on one side, where each image in the training set has its own vector representation. While on the other side we have a newly uploaded image by user and its own vector representation. By comparing vector representations of images, we can see if these vectors are similar or close to one another, which represent the similarity of the compared images. And that's where the cosine distance function comes into play.

This is the formula that is used to calculate costs and distance between two vectors. As you can see on this graph, the cosine similarity compares the two vectors by calculating the area or the angle between them. The smaller the area between the vectors, the bigger the similarity is. For example, if the angle is zero and cosine of zero is one, meaning those two vectors are identical Now that we know how cos and distance works, well basically intuition behind it, let's write the function itself. While writing this function, we should keep in mind that we have to compare one query image to all images in our training set. Before we determine the most similar images, there are, first thing that we have to define is an empty list called distances.

In our case, in this list, we are going to store all the calculated distances, because we have to compare each and every training set sample with the query vector, we have to go through the whole training set. And we can do that by simply using a for loop. In the case of C for 10 data set we have 50,000 friends Set images, we are going to use the imported costing distance function from the sci fi library that we have imported already. It takes two arguments u and v vectors, u will be our current training set vector, and V our query vector. Now, we don't really need distance values, but indices of images that are closest to our query image. So we will return MP dot arg sort of distances.

This function takes an array and source it, but instead of returning the sorted array, it returns sorted indices, which is exactly what we need. And finally, let's return top n indices. And we are done. If you have any questions or comments so far, please post them in the comment section. Otherwise, I'll see you in the next tutorial.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.