APPLICATION - Extracting the Product Names, Links and Prices. Saving to Excel

Python 3: Automating Your Job Tasks Superhero Level: Automate Web Scraping with Python 3
12 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$99.99
List Price:  $139.99
You save:  $40
€92.16
List Price:  €129.04
You save:  €36.87
£78.95
List Price:  £110.53
You save:  £31.58
CA$136.82
List Price:  CA$191.55
You save:  CA$54.73
A$150.88
List Price:  A$211.24
You save:  A$60.35
S$134.99
List Price:  S$189
You save:  S$54
HK$780.84
List Price:  HK$1,093.20
You save:  HK$312.36
CHF 92.16
List Price:  CHF 129.04
You save:  CHF 36.87
NOK kr1,065.65
List Price:  NOK kr1,491.95
You save:  NOK kr426.30
DKK kr688.26
List Price:  DKK kr963.60
You save:  DKK kr275.33
NZ$163.32
List Price:  NZ$228.66
You save:  NZ$65.33
د.إ367.26
List Price:  د.إ514.18
You save:  د.إ146.92
৳11,714.86
List Price:  ৳16,401.27
You save:  ৳4,686.41
₹8,305.04
List Price:  ₹11,627.39
You save:  ₹3,322.35
RM470.40
List Price:  RM658.58
You save:  RM188.18
₦146,115.27
List Price:  ₦204,567.22
You save:  ₦58,451.95
₨27,783.57
List Price:  ₨38,898.11
You save:  ₨11,114.54
฿3,658.80
List Price:  ฿5,122.47
You save:  ฿1,463.66
₺3,219.48
List Price:  ₺4,507.41
You save:  ₺1,287.92
B$516.66
List Price:  B$723.35
You save:  B$206.68
R1,841.76
List Price:  R2,578.54
You save:  R736.77
Лв180.52
List Price:  Лв252.74
You save:  Лв72.21
₩136,656.33
List Price:  ₩191,324.33
You save:  ₩54,668
₪366.73
List Price:  ₪513.44
You save:  ₪146.71
₱5,819.86
List Price:  ₱8,148.04
You save:  ₱2,328.18
¥15,692.92
List Price:  ¥21,970.72
You save:  ¥6,277.79
MX$1,669.81
List Price:  MX$2,337.80
You save:  MX$667.99
QR364.18
List Price:  QR509.87
You save:  QR145.68
P1,356.93
List Price:  P1,899.76
You save:  P542.82
KSh13,033.20
List Price:  KSh18,247
You save:  KSh5,213.80
E£4,712.01
List Price:  E£6,597.01
You save:  E£1,884.99
ብር5,738.47
List Price:  ብር8,034.09
You save:  ብር2,295.62
Kz84,827.41
List Price:  Kz118,761.77
You save:  Kz33,934.36
CLP$90,869.43
List Price:  CLP$127,220.83
You save:  CLP$36,351.40
CN¥710.97
List Price:  CN¥995.39
You save:  CN¥284.42
RD$5,881.38
List Price:  RD$8,234.17
You save:  RD$2,352.78
DA13,438.44
List Price:  DA18,814.36
You save:  DA5,375.91
FJ$226.74
List Price:  FJ$317.44
You save:  FJ$90.70
Q775.80
List Price:  Q1,086.15
You save:  Q310.35
GY$20,895.31
List Price:  GY$29,254.27
You save:  GY$8,358.96
ISK kr13,791.40
List Price:  ISK kr19,308.52
You save:  ISK kr5,517.11
DH996.21
List Price:  DH1,394.74
You save:  DH398.52
L1,771.78
List Price:  L2,480.57
You save:  L708.78
ден5,676.40
List Price:  ден7,947.19
You save:  ден2,270.78
MOP$803.71
List Price:  MOP$1,125.23
You save:  MOP$321.51
N$1,835.52
List Price:  N$2,569.81
You save:  N$734.28
C$3,675.86
List Price:  C$5,146.35
You save:  C$1,470.49
रु13,281.76
List Price:  रु18,594.99
You save:  रु5,313.23
S/373.37
List Price:  S/522.73
You save:  S/149.36
K388.13
List Price:  K543.39
You save:  K155.26
SAR375.02
List Price:  SAR525.04
You save:  SAR150.02
ZK2,665.60
List Price:  ZK3,731.94
You save:  ZK1,066.34
L458.57
List Price:  L642.02
You save:  L183.44
Kč2,271.01
List Price:  Kč3,179.50
You save:  Kč908.49
Ft35,372.15
List Price:  Ft49,522.43
You save:  Ft14,150.27
SEK kr1,086.34
List Price:  SEK kr1,520.92
You save:  SEK kr434.58
ARS$88,911.95
List Price:  ARS$124,480.29
You save:  ARS$35,568.34
Bs690.12
List Price:  Bs966.19
You save:  Bs276.07
COP$386,435.76
List Price:  COP$541,025.52
You save:  COP$154,589.76
₡51,204.17
List Price:  ₡71,687.89
You save:  ₡20,483.71
L2,468.08
List Price:  L3,455.41
You save:  L987.33
₲751,203.84
List Price:  ₲1,051,715.43
You save:  ₲300,511.59
$U3,849.38
List Price:  $U5,389.29
You save:  $U1,539.90
zł392.63
List Price:  zł549.70
You save:  zł157.07
Already have an account? Log In

Transcript

Hi, and welcome to this video. In this lecture we are going to build, analyze and test a web scraping application that will browse for and extract information about the products listed on a test page. The application will grab the product name, the URL link of that product, and also its price from the webpage, then it will use this data to build a panda's data frame. And finally, it will write that data frame to an Excel spreadsheet that generating a nicely formatted report or database, which is actually the goal of every web scrapping script. As a side note, if you're not familiar with pandas data frames, please go through the data analysis section of this course. And then you will be able to move on with building this application.

Assuming you already went through that section and you're comfortable using pandas data frames. Let's move further and see the webpage that we are going to scrap so the This is the webpage right here. By the way, I have attached the URL of this test page to this video. So you can either type in the link yourself or use the attached link. So here we have a web page listing 21 tablets. For each device, we have its name, a short description, a price, a link, if you hover your mouse over the name, then we have the review score.

And finally, the number of reviews. This is how most ecommerce websites look like. And we are going to use this exact page to test our application against next let me show you the code for this application. So this is it right here in Notepad plus plus, we have exactly 50 lines of code, including comments and blank lines. Now let's start analyzing the code. You can either choose to type in the code in your own preferred text editor, or you can copy and paste the code from the upcoming notebook.

More over you're going to find the Python script itself also attached To the notebook following this video. Okay, let's get to the fun part. Now. First of all, as always, you have to import the necessary modules. In our case, we need to import the pandas and request modules, as well as the beautiful subclass from within the ps4 module. The next four lines of code, these ones right here are pretty easy to understand since we've already discussed them in the previous videos of this section.

Basically, we are passing the link of the web page to the get method. And then we are loading and parsing the page content. And finally, we are building a result set object by searching for all the div tags that match the class passed as a dictionary value in the second argument of the Find all method right here. Now let's check the web page again, to verify that this is indeed the class that corresponds to each of the 21 products on the page. I'm going to right click on one of these products and hit inspect and indeed If we look at the div itself, we have class called SM four called LG four called MD four, which is exactly the same class I have in my code. Okay, this is correct.

Next, notice that I have created three empty lists, called names, links and prices. Each of these lists will get populated with the names of all the products on the page, as well as the URL link and the prices of each of these products respectively. to populate this list, we have to iterate over the products variable, which is actually our result set object containing the data for all 21 tablets on the page right. Now let's see what each of these products looks like in HTML code, and I'll switch to idle and execute the code that we've discussed thus far in this video. Okay, at this point, let's extract and print out the first product from our products result set. So let's use print of products.

Let's use index zero because I said we want the first product dot prettify open and close parentheses, enter. So as you can see, it is the Lenovo idea tab at a price of 6999. Is this the first product listed on the webpage? Let's check this. And yes, indeed it is. Great.

Now at the beginning of this video, I said that for each product, we want to extract its name, the URL link to the product, and also its price. Let's do that for the first product in the list based on the HTML code that we're seeing on the screen right now, and the things you've learned throughout this section. First, we see that the name of the device is included in this a tag right here. So to get to that tag, and also extract only the string enclosed by that tag, we can use the following code products of zero dot A the name of the Tag not string. Okay, great. Now what about the link to this product, that would be the underlying URL of the same a tag right here.

So in order to extract the value of that attribute, all we have to do is to treat the a tag as a dictionary, as we did earlier in this section, and then extract the attribute value using the attribute name as a dictionary key. So simple enough, we will use products of zero dot A. And now the key, the name of the key href, which as I said is the name of the attribute up here. Let's hit enter. Now since this is not a valid link, we will have to prepend it with the domain name and extension of this website by simply performing string concatenation. Let me show you how to do this.

So we have HTTPS, colon slash slash, the name of the website and then we are concatenating this string With the string above, let's hit enter. Okay, great. Finally, we need to extract the price of the product as well. As you can clearly see the price is enclosed in an h4 tag. So let's go ahead and extract it using the string object attribute once again, this time applied to the h4 tag. So that would be productive zero dot h for dot string, Enter.

And great job we got the price as well. Now let's get back to our application code. And notice that the operation we just did in the interpreter are the ones performed for each product in the product result set using a basic for loop. So this is the for loop right here. Additionally, for each product in the list, we are appending the specific name, link and price information to the correct list. As soon as all the product information is extracted and saved to the correct list, we have to combine the data in the three lists.

Above, so that the name link and price of each product gets grouped together into one unique container. Let's say this is easily done using the zip function. As you can see, I did right here. This function takes the three lists as arguments and combines the first element of list one with the first element of list two with the first element of list three into a single topple, then it does the same for all the elements and puts all the resulting topples into a list. Let me show you what I mean by using a basic example in the interpreter. So let's assume we have list one equals 123. list two equals 1020 and 30 and list three equals 100 203 hundred.

Now we want to have 110 and 100 inside the same container, then we want the same thing. 220 and 203 30 and 300, and so on. For this, all we have to do is use the zip function and pass the three lists as arguments to the function. So let's try this zip of list one, list two, and list three. Now notice that we got a zip object, not a plain list. To convert this to a list, you just have to simply use the list function like this.

So list of zip of list one, list two, list three, enter, and there's the result we were looking for. Similarly, we want the same operation to be performed on our lists in our application, so that we can have the name link and price of the first product in the first couple. Then the same information for the second product in the second topple and so on until the products are exhausted. Now let me create the same three lists in the interpreter and then populate them and zip them together. As we are doing inside the application code, so we have names, links, and prices. And now let me just copy and paste this for loop into the interpreter.

Finally, let me copy this line as well. So data equals list of zip of names, links and prices. Let's see data. Okay, so this is the list of tuples in our application, where each topple represents a product. Notice the first topple here with name, link, and price, then the second topple, and so on. Since we have 21 products on the page, the length of this list should be 21.

Right? So let's use Len of data to confirm this. And indeed it is great. Now back to the application code once again. The next thing I did is I have created a panda's data frame on this row here, using the data stored as a list of tables. also adding the columns name For each of the three columns, this is going to get written to the Excel file as well as column headers.

Last but not least, I'm using the to excel method from within the pandas module. To write the contents of this data frame to a brand new Excel file. All you have to do is enter the back and name of the file, and pandas will write the data to that file. If the file does not exist, then pandas will go ahead and create it for you and then it will populate it with the desired information. The final thing to notice here in my code is the use of these try except else finally block. Basically Python will try to write the data frame to the Excel file.

If for some reason an exception gets raised, then the program will print out this string to the screen, the one under the except clause. On the other hand, if the code under the try statement gets executed successfully, then the string under the else clause gets executed as well and printed out to the screen. Finally, regardless of the execution success status of the code under the try statement, the string specified under the finally clause right here gets executed. So like it or not, the spring function will be called No matter if the above try accept code raises an exception or not. That's it. That's our application, it's time to test it.

First, let's go to the folder I have created on my D drive to check if there's any Excel spreadsheets saved there. And it isn't great. Now let's open up the windows command line and run our script. By the way, don't worry about the second script right here. We are going to discuss it in the next lecture. For now we are going to run the web scraper.py script.

So let's open up the windows cmd. And I'm going to write in Python. The web scrapping web scraper.py enter. Okay, so we have web data successfully written to excel. quitting the program. So you should first notice the strings under the else and finally clauses being printed out to the screen.

Now let's check the folder once again. And here's the brand new Excel file. Let's open it. Okay. And indeed, we have all the products 21 lines right here starting at index zero up to index 20, along with their names, URL links and prices being saved inside this file. Also notice the column headers right here that we've configured in our code.

So congratulations on building your first web scraper with Python and beautifulsoup. I'll see you in the next lecture where we are going to upgrade this application.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.