Tags, Names and Attributes

Python 3: Automating Your Job Tasks Superhero Level: Automate Web Scraping with Python 3
10 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$99.99
List Price:  $139.99
You save:  $40
€93.41
List Price:  €130.79
You save:  €37.37
£79.90
List Price:  £111.86
You save:  £31.96
CA$137.16
List Price:  CA$192.03
You save:  CA$54.87
A$153.08
List Price:  A$214.33
You save:  A$61.24
S$135.90
List Price:  S$190.27
You save:  S$54.36
HK$781.66
List Price:  HK$1,094.36
You save:  HK$312.69
CHF 91.23
List Price:  CHF 127.73
You save:  CHF 36.49
NOK kr1,107.76
List Price:  NOK kr1,550.92
You save:  NOK kr443.15
DKK kr696.86
List Price:  DKK kr975.63
You save:  DKK kr278.77
NZ$168.60
List Price:  NZ$236.05
You save:  NZ$67.44
د.إ367.24
List Price:  د.إ514.16
You save:  د.إ146.91
৳10,972.60
List Price:  ৳15,362.09
You save:  ৳4,389.48
₹8,346.11
List Price:  ₹11,684.89
You save:  ₹3,338.77
RM475.54
List Price:  RM665.78
You save:  RM190.23
₦123,487.65
List Price:  ₦172,887.65
You save:  ₦49,400
₨27,846.85
List Price:  ₨38,986.71
You save:  ₨11,139.85
฿3,694.58
List Price:  ฿5,172.56
You save:  ฿1,477.98
₺3,236.41
List Price:  ₺4,531.11
You save:  ₺1,294.69
B$519.35
List Price:  B$727.11
You save:  B$207.76
R1,861
List Price:  R2,605.48
You save:  R744.47
Лв182.73
List Price:  Лв255.83
You save:  Лв73.10
₩137,535.16
List Price:  ₩192,554.72
You save:  ₩55,019.56
₪374.71
List Price:  ₪524.61
You save:  ₪149.90
₱5,763.02
List Price:  ₱8,068.46
You save:  ₱2,305.43
¥15,525.12
List Price:  ¥21,735.79
You save:  ¥6,210.67
MX$1,695.96
List Price:  MX$2,374.42
You save:  MX$678.45
QR364.20
List Price:  QR509.89
You save:  QR145.69
P1,420.18
List Price:  P1,988.32
You save:  P568.13
KSh13,372.99
List Price:  KSh18,722.73
You save:  KSh5,349.73
E£4,798.26
List Price:  E£6,717.75
You save:  E£1,919.49
ብር5,719.92
List Price:  ብር8,008.12
You save:  ብር2,288.20
Kz83,641.63
List Price:  Kz117,101.63
You save:  Kz33,460
CLP$96,028.39
List Price:  CLP$134,443.59
You save:  CLP$38,415.20
CN¥723.89
List Price:  CN¥1,013.48
You save:  CN¥289.58
RD$5,819.58
List Price:  RD$8,147.64
You save:  RD$2,328.06
DA13,455.35
List Price:  DA18,838.03
You save:  DA5,382.68
FJ$226.17
List Price:  FJ$316.65
You save:  FJ$90.48
Q777.12
List Price:  Q1,088.01
You save:  Q310.88
GY$20,929.88
List Price:  GY$29,302.67
You save:  GY$8,372.79
ISK kr14,022.59
List Price:  ISK kr19,632.19
You save:  ISK kr5,609.60
DH1,011.06
List Price:  DH1,415.53
You save:  DH404.46
L1,768.33
List Price:  L2,475.73
You save:  L707.40
ден5,750.59
List Price:  ден8,051.06
You save:  ден2,300.46
MOP$805.69
List Price:  MOP$1,128
You save:  MOP$322.31
N$1,863.42
List Price:  N$2,608.86
You save:  N$745.44
C$3,672.13
List Price:  C$5,141.13
You save:  C$1,469
रु13,425.24
List Price:  रु18,795.88
You save:  रु5,370.63
S/376.35
List Price:  S/526.91
You save:  S/150.55
K385.89
List Price:  K540.26
You save:  K154.37
SAR375.01
List Price:  SAR525.03
You save:  SAR150.02
ZK2,677.10
List Price:  ZK3,748.04
You save:  ZK1,070.94
L464.84
List Price:  L650.79
You save:  L185.95
Kč2,347.91
List Price:  Kč3,287.17
You save:  Kč939.26
Ft36,370.18
List Price:  Ft50,919.71
You save:  Ft14,549.52
SEK kr1,092.76
List Price:  SEK kr1,529.91
You save:  SEK kr437.15
ARS$87,614.14
List Price:  ARS$122,663.30
You save:  ARS$35,049.16
Bs690.85
List Price:  Bs967.22
You save:  Bs276.36
COP$390,329.27
List Price:  COP$546,476.60
You save:  COP$156,147.32
₡51,017.42
List Price:  ₡71,426.43
You save:  ₡20,409.01
L2,476.75
List Price:  L3,467.55
You save:  L990.79
₲749,236.02
List Price:  ₲1,048,960.40
You save:  ₲299,724.38
$U3,833.15
List Price:  $U5,366.56
You save:  $U1,533.41
zł404.64
List Price:  zł566.52
You save:  zł161.87
Already have an account? Log In

Transcript

Hi, and welcome back. Following up on the tasks that we performed in the previous video, in this lecture, you're going to learn how to extract various pieces of information from a web page. This is going to be very interesting and very useful at the same time. So let's get started. First of all, let's go to the target webpage, the one I mentioned in the previous lecture, you can find the link attached to this lecture as well. On this web page, just right click anywhere and select view page source.

This option is available in any browser. I'm using Google Chrome here. But if you're using Firefox or Safari, the same feature should be available, perhaps with a slightly different name. Now, as soon as I hit view page source, a new tab opens up where you can see the entire HTML code of this particular page. Leave this page open and let's go to the Python command line for a bit to make sure that we have everything set up and ready for action. So here, I am just reusing the code that we already discussed in the previous lecture.

At this point, we are ready to start extracting data from this web page using beautifulsoup. And we will perform this extraction based on HTML tags back to the browser now. So let's say that we want to get the contents of the head tag right here. For this, just go back to idle. And let's use the Beautiful Soup object to get the desired outcome. So all we have to do is result dot head.

That's it. That is the content of the head section of your web page. Pretty easy. Am I right? Now we can also assign this to a variable for easier reference in your application code. So let's say head equals result dot head.

Let's also check the type of this variable. So type of head. This is called a tag object. Another important thing that you should keep in mind is that this operation right here returns only the content of the first tag by that name. So not only The tags with that name only the first one it points on the web page. Now back to the website.

Let's do the same thing, this time for a different tag, let's say an h2 heading. So let's search for the first h2 heading in our HTML code, h2. This is it right here. Let me increase the font a bit. So this would be the h2 header that we are going to obtain by using another attribute that I'm going to use right now in the Python interpreter. So simple enough, just type in h2 equals result dot h2.

Now let's see h2. So we have top items being scraped right now. Let's go back to the web page. And this is indeed the result we were looking for. Great. What about the third title tag, for instance, so let's search for title.

So this would be the result that we are going to obtain in the Python interpreter. Let's check this as well. So let's say title equals result dot title. Enter. Let's see title right now. And sure enough, we get the title tag and its content as a result.

What else, we can also access the name of each tag by using the name attribute. So let's try this head dot name, h2 dot name. And also, let's try title dot name. Of course, this returns the tag name itself, depending on what kind of tag Are you referring to? No surprises so far. But what if you want to change the name of a tag in your local instance of the web page?

Well, that's also very easy to do. Just use the name attribute and the equal sign as an assignment operator. For instance, for the title tag, we can just use title dot name, equals, let's say my title. Now, if we check this tag, once again, we're going to see the name change in effect. So title, and indeed, this is the correct result. Next, let's talk a bit about the HTML tag.

The themselves, each tag may have any number of HTML attributes. For example, going back to our web page, let's consider the header tag on the web page. This tag has two attributes. The first one is role, and it has the value of banner. And the second one is class having three values navbar navbar, fixed top and navbar static. That's why in this case roll is called a single valued attribute.

And class is called a multivalued attribute. Now, first of all, let's extract the header tag and its contents as we did earlier with other tags. So let's go to the Python interpreter and just use header equals result dot header. Now let's see the header. Let's double click on this button. Okay, this is the header right here.

By the way, you can also use the predefined methods to have the header content shown in a much nicer way. So let me show you print of header dot gratify enter. Okay, this output is a lot better a lot nicer and cleaner as compared with the previous one. Now back to the HTML attributes of the header, we can use an object attribute called add RS to view the tags on attributes as a dictionary having the HTML attribute name as a key, and the attribute value as the value in that specific key value pair. Let me show you what I mean by that. So let's sorry, let's use header dot add Rs.

So as I previously said, using the A TT RS object attribute, we can get a dictionary with the HTML attributes as the keys. Notice role and class right here, and the HTML values as the values in the dictionary. Also notice that four multivalued HTML attributes you get a list of the values very neat and clean. Great. Now, as with any other dictionary, you can perform various tasks to play around with its elements. For instance, let's see how to access the value of a specific HTML attribute in our dictionary, well, very easy to do header of roll, Enter.

And similarly header of class and we get the list. Okay, we can achieve the same results by using the get method like this. So we have header dot get of roll. And likewise, we have header dot get of class, and the results indeed are identical. Please note that this is the get method from within the beautifulsoup module, not the get method from the request module. Okay.

Furthermore, we can also modify a text attribute value as we will do inside any other dictionary. To do that, we can use Heather of role equals, let's say, I don't know Sam's I'm not very creative today, enter. Now let's see header dot add Rs. Notice the value of role being changed accordingly. Great. What else?

Let's add a new tag attributes to our dictionary. For instance, let's say we want to add header of, let's say new equals Python. Now if we use header dot add RS once again, there is a new tag attribute and its value. Finally, let's see how to remove a tag attribute. Well, simple enough as we did with dictionaries, they'll have Heather new. Let's check this once again.

And the attribute called New is now gone from the dictionary. Pretty straightforward, I think. Next, I'll show you how to extract the string, the text part only from a certain HTML tag. And let's consider the h2 tag from earlier in this video. So h2 assuming that we want to extract only the string this string right here Top items being scraped right now, all there is to do is just use the string object attribute like this h2 string, and then the string nice and clean. The last thing that I'm going to show you in this video is how to zoom into a certain area of the HTML tree on your web page.

What do I mean by that? Well, let's take a look at the header area once again, inside our HTML code. This is it right here. You can see that it has a div tag nested underneath it. This div tag has yet another div tag nested, so this one right here, which in turn contains on a tag, bottom tag, and for span tags at various levels of nesting. Let's say that you want to access the bottom tag.

So this would be this one right here, along with its content, which is several nesting levels underneath the header tag itself. How would you refer to the bottom header given this HTML? Code structure. Well, it's way easier than you might think. Let me show you. So we can use result dot header dot div dot div dot A dot button.

This way you can zoom into whatever part of the HTML page you desire. In this case, even more convenient would be to use result dot header dot bottom, since this is the only button tag in the header, and the result is, of course the same. However, keep in mind that using the tag name as an attribute will return only the first tag by that name. For instance, here we have four span tags, these ones right here, but when using the same zooming method, only the first one will be returned. So let me test this. We have result dot header dot span.

And indeed We have toggle navigation, which if we look on the website is indeed only the first span tag underneath the hether tech, you will learn how to find all occurrences of a certain tag in the next lecture. So I will see you there

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.