Extracting and Parsing Web Content

Python 3: Automating Your Job Tasks Superhero Level: Automate Web Scraping with Python 3
4 minutes
Share the link to this page
You need to have access to the item to view this lesson.
One-time Fee
List Price:  $139.99
You save:  $40
List Price:  €129.06
You save:  €36.87
List Price:  £110.01
You save:  £31.43
List Price:  CA$191.35
You save:  CA$54.67
List Price:  A$210.84
You save:  A$60.24
List Price:  S$188.90
You save:  S$53.97
List Price:  HK$1,093.06
You save:  HK$312.32
CHF 91.36
List Price:  CHF 127.90
You save:  CHF 36.54
NOK kr1,064.83
List Price:  NOK kr1,490.80
You save:  NOK kr425.97
DKK kr687.94
List Price:  DKK kr963.14
You save:  DKK kr275.20
List Price:  NZ$228.40
You save:  NZ$65.26
List Price:  د.إ514.18
You save:  د.إ146.92
List Price:  ৳16,418.13
You save:  ৳4,691.23
List Price:  ₹11,653.68
You save:  ₹3,329.86
List Price:  RM658.58
You save:  RM188.18
List Price:  ₦202,635.52
You save:  ₦57,900
List Price:  ₨39,010.23
You save:  ₨11,146.57
List Price:  ฿5,113.97
You save:  ฿1,461.24
List Price:  ₺4,509.83
You save:  ₺1,288.61
List Price:  B$721.46
You save:  B$206.14
List Price:  R2,573.45
You save:  R735.32
List Price:  Лв252.47
You save:  Лв72.14
List Price:  ₩190,865.12
You save:  ₩54,536.78
List Price:  ₪514.04
You save:  ₪146.88
List Price:  ₱8,144.28
You save:  ₱2,327.10
List Price:  ¥21,931.91
You save:  ¥6,266.71
List Price:  MX$2,330.37
You save:  MX$665.86
List Price:  QR510.97
You save:  QR146
List Price:  P1,893.83
You save:  P541.13
List Price:  KSh18,548.67
You save:  KSh5,300
List Price:  E£6,593.52
You save:  E£1,884
List Price:  ብር8,051.60
You save:  ብር2,300.62
List Price:  Kz118,917.63
You save:  Kz33,978.89
List Price:  CLP$125,807.61
You save:  CLP$35,947.60
List Price:  CN¥995.41
You save:  CN¥284.42
List Price:  RD$8,224.32
You save:  RD$2,349.97
List Price:  DA18,834.81
You save:  DA5,381.76
List Price:  FJ$317.23
You save:  FJ$90.64
List Price:  Q1,088.99
You save:  Q311.16
List Price:  GY$29,321.70
You save:  GY$8,378.22
ISK kr13,838.61
List Price:  ISK kr19,374.61
You save:  ISK kr5,536
List Price:  DH1,387.67
You save:  DH396.50
List Price:  L2,475.08
You save:  L707.21
List Price:  ден7,958.33
You save:  ден2,273.97
List Price:  MOP$1,126.84
You save:  MOP$321.97
List Price:  N$2,547.09
You save:  N$727.79
List Price:  C$5,158.32
You save:  C$1,473.91
List Price:  रु18,669.25
You save:  रु5,334.45
List Price:  S/523.33
You save:  S/149.53
List Price:  K544.66
You save:  K155.63
List Price:  SAR525.05
You save:  SAR150.02
List Price:  ZK3,654.34
You save:  ZK1,044.17
List Price:  L642.19
You save:  L183.49
List Price:  Kč3,189.28
You save:  Kč911.28
List Price:  Ft49,959.85
You save:  Ft14,275.26
SEK kr1,068.91
List Price:  SEK kr1,496.52
You save:  SEK kr427.60
List Price:  ARS$124,588.23
You save:  ARS$35,599.18
List Price:  Bs968.45
You save:  Bs276.72
List Price:  COP$533,464.74
You save:  COP$152,429.38
List Price:  ₡71,860.04
You save:  ₡20,532.90
List Price:  L3,463.59
You save:  L989.66
List Price:  ₲1,054,446.66
You save:  ₲301,291.99
List Price:  $U5,362.45
You save:  $U1,532.23
List Price:  zł550.82
You save:  zł157.39
Already have an account? Log In


Okay, welcome back in this lecture, we are going to import the modules that we've just installed. Then we are going to make a request and get a web page from a website that is specially designed for allowing web scraping tests. And finally, you're going to see what the extracted content looks like in the Python interpreter. Without further ado, let's get to work. First of all, let's open up idle and import the two modules that we've discussed in the previous video. So import requests.

And secondly, from bs for import beautifulsoup. Notice that we need to use bs four to import the beautifulsoup class where bs four stands for Beautiful Soup version four, which is the current version of this module. Now let me show you a great website that I found which is designed for testing your web scraping applications. So that would be web scraper.io forward slash test sites for slash ecommerce. And for this video we are going to use this page right here, which is forward slash static. So how can we get this page loaded into our Python environment?

Well, simple enough using the request module and the get method from within this module. Therefore, I'm going to type in the line of code that performs this task, and then we will discuss it. So okay, so this is the line of code that we need right here. What I did here is I use the get method prepended by the name of the module. And in between the parentheses of get I've simply pasted the URL of the page that I want to load. Also, I use the variable called web page.

To reference this object, I say object because this is called a response object. Actually, if you use the type function on this variable, so type of web page, you can see that we got as I said, a response object. Next, we can view the content of this page either as a string or as bytes. So two different data types that we can choose from. to see the content as a string, just use web page dot txt. Actually, let me assign this to a variable, let's call it.

Txt equals web page dot txt, enter. Now let's confirm that this is indeed a string using the same old type function. So type of text. Okay, great. Now let's see the page content itself. So text, enter.

Let me double click on this button right here. Okay, this is it. If you're familiar with HTML, then you should feel comfortable reading and working with this syntax. Otherwise, if you have no idea about HTML, then you should search for a basic free online tutorial to grasp the essential concepts so that you will have the necessary skills to start performing web scrapping tasks and building applications. Next, let's see how to view the content as byte as well. So simple enough, let's say content equals web page dot content.

Enter. Now, let's Check type of content. So type of content, which is byte as expected. Great. Now let's display the web page content once again. So content again, let's click this button right here.

So this is the content. Okay, awesome. Finally, we can use the Beautiful Soup class to parse the extracted webpage using a parser. So that you will be able to make use of the Beautiful Soup module to work with websites and handle HTML content. For instance, using the content variable from above, let's create a new object, I'm going to call it result. So result equals Beautiful Soup of content.

Now, if we display result, the content already looks pretty familiar, isn't it? Okay. Now, if we check type of result, so type of result, then we can see that this is called the Beautiful Soup object representing the entire content of the web page. Now keep in mind that the beautiful Superman module can work with several HTML or XML parsers. And you can specify the desired parser right after the comma in between double quotes when creating a Beautiful Soup object like this, so comma, and now in between double quotes, HTML dot parser. This is perhaps the most widely used parser and we are going to use it throughout this section for our application, since it does its job properly and raises no errors or exceptions.

Okay, enough for now. I hope you enjoyed this video and I will see you in the next one.

Sign Up


Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.