Python 3 Regex - match() & search()

Python 3: From Scratch to Intermediate ADVANCED LEVEL: Python 3 - Regular Expressions
16 minutes
Share the link to this page
Copied
  Completed
You need to have access to the item to view this lesson.
One-time Fee
$99.99
List Price:  $139.99
You save:  $40
€92.72
List Price:  €129.81
You save:  €37.09
£79.26
List Price:  £110.97
You save:  £31.71
CA$135.49
List Price:  CA$189.69
You save:  CA$54.20
A$153.49
List Price:  A$214.89
You save:  A$61.40
S$134.92
List Price:  S$188.90
You save:  S$53.97
HK$782.50
List Price:  HK$1,095.53
You save:  HK$313.03
CHF 90.16
List Price:  CHF 126.23
You save:  CHF 36.07
NOK kr1,085.06
List Price:  NOK kr1,519.13
You save:  NOK kr434.06
DKK kr691.90
List Price:  DKK kr968.69
You save:  DKK kr276.78
NZ$167.41
List Price:  NZ$234.39
You save:  NZ$66.97
د.إ367.19
List Price:  د.إ514.08
You save:  د.إ146.89
৳10,980.73
List Price:  ৳15,373.46
You save:  ৳4,392.73
₹8,335.04
List Price:  ₹11,669.39
You save:  ₹3,334.35
RM472.40
List Price:  RM661.38
You save:  RM188.98
₦139,532.04
List Price:  ₦195,350.44
You save:  ₦55,818.40
₨27,813.53
List Price:  ₨38,940.06
You save:  ₨11,126.52
฿3,638.31
List Price:  ฿5,093.78
You save:  ฿1,455.47
₺3,237.16
List Price:  ₺4,532.16
You save:  ₺1,294.99
B$501.53
List Price:  B$702.17
You save:  B$200.63
R1,888.66
List Price:  R2,644.21
You save:  R755.54
Лв180.76
List Price:  Лв253.07
You save:  Лв72.31
₩134,741.70
List Price:  ₩188,643.77
You save:  ₩53,902.07
₪367.71
List Price:  ₪514.81
You save:  ₪147.10
₱5,617.33
List Price:  ₱7,864.49
You save:  ₱2,247.16
¥15,132.45
List Price:  ¥21,186.03
You save:  ¥6,053.58
MX$1,658.17
List Price:  MX$2,321.51
You save:  MX$663.33
QR364.88
List Price:  QR510.84
You save:  QR145.96
P1,378.06
List Price:  P1,929.34
You save:  P551.28
KSh13,256.67
List Price:  KSh18,559.87
You save:  KSh5,303.20
E£4,744.14
List Price:  E£6,641.99
You save:  E£1,897.84
ብር5,687.29
List Price:  ብር7,962.44
You save:  ብር2,275.14
Kz83,433.85
List Price:  Kz116,810.73
You save:  Kz33,376.88
CLP$98,290.17
List Price:  CLP$137,610.17
You save:  CLP$39,320
CN¥722.08
List Price:  CN¥1,010.95
You save:  CN¥288.86
RD$5,924.54
List Price:  RD$8,294.60
You save:  RD$2,370.05
DA13,479.97
List Price:  DA18,872.49
You save:  DA5,392.52
FJ$225.66
List Price:  FJ$315.94
You save:  FJ$90.27
Q780.13
List Price:  Q1,092.22
You save:  Q312.08
GY$20,949.30
List Price:  GY$29,329.86
You save:  GY$8,380.56
ISK kr13,941.60
List Price:  ISK kr19,518.80
You save:  ISK kr5,577.20
DH1,013.74
List Price:  DH1,419.27
You save:  DH405.53
L1,764.37
List Price:  L2,470.19
You save:  L705.82
ден5,702.11
List Price:  ден7,983.18
You save:  ден2,281.07
MOP$806.29
List Price:  MOP$1,128.85
You save:  MOP$322.55
N$1,909.24
List Price:  N$2,673.01
You save:  N$763.77
C$3,682.20
List Price:  C$5,155.22
You save:  C$1,473.02
रु13,347.49
List Price:  रु18,687.02
You save:  रु5,339.53
S/372.08
List Price:  S/520.93
You save:  S/148.85
K382.87
List Price:  K536.04
You save:  K153.16
SAR375.15
List Price:  SAR525.22
You save:  SAR150.07
ZK2,481.24
List Price:  ZK3,473.83
You save:  ZK992.59
L460.94
List Price:  L645.33
You save:  L184.39
Kč2,346.52
List Price:  Kč3,285.22
You save:  Kč938.70
Ft36,532.34
List Price:  Ft51,146.74
You save:  Ft14,614.40
SEK kr1,069.05
List Price:  SEK kr1,496.71
You save:  SEK kr427.66
ARS$85,802.56
List Price:  ARS$120,127.02
You save:  ARS$34,324.45
Bs691.35
List Price:  Bs967.92
You save:  Bs276.56
COP$386,901.39
List Price:  COP$541,677.43
You save:  COP$154,776.03
₡50,320.66
List Price:  ₡70,450.94
You save:  ₡20,130.27
L2,469.73
List Price:  L3,457.73
You save:  L987.99
₲739,686.52
List Price:  ₲1,035,590.72
You save:  ₲295,904.20
$U3,756.16
List Price:  $U5,258.77
You save:  $U1,502.61
zł399
List Price:  zł558.62
You save:  zł159.61
Already have an account? Log In

Transcript

First, let's try to define a regular expression, or regex. As it is usually abbreviated, a regular expression is represented by a special syntax that helps you find and match a pattern inside the string. Let me give you a very basic example. Let's say you have this string right here in the Python interpreter. Now what if you want to find the programming language in this string that starts with a capital P, and has four letters, or the programming languages starting with the substring, Java, regardless of j being uppercase, or lowercase, or maybe you want the last four characters preceding the digit three. So these characters right here, for all of those cases, and many more, you will use regular expressions.

During this lecture and the next one, we will talk about four methods that Python provides for dealing with regularity. expressions and text pattern matching. Another thing that you should know is that all operations related to regular expressions are provided by the array built in module. So, whenever you want to use regular expressions, you should first import this module using the import are a statement. So import our E, and then prepend, the name of each method with our E and a dot. Now let's start discussing the first method which is match.

The match method searches the pattern you provide as its first argument. In the string you enter as its second argument, but only at the beginning of that string. The general syntax for this method is the following, where A is called a matching object or a match object if the pattern is found inside the string, otherwise, a will be none if the pattern cannot be matched inside the string. Now let's return to our string. My str And use the match method to see if the characters y, o and u, so you are indeed positioned at the beginning of the string. Let me write this first.

So our e dot match u comma my SDR let's see a. So you see that a is a match object, because Python has found the string given as a pattern at the beginning of the my str string. Now, if you would try to match a random string at the beginning of my SDR, you should get none returned. Don't trust me, let's verify this. So let's say a, and we're trying to match a random string ABC at the beginning of my str. Now let's see a and type of a.

So indeed, this time a is of type non type. Okay, now let's redefine a as it was before, so that it would return a match object again. You can have the entire match returned By using the group method on the match object. So let's see this in action, a dot group open and close parentheses. And as I said, Python returns the match it found in the string, according to the pattern we provided. In this case, the pattern was the substring itself, y o u.

Now, let me give you an example that uses an optional flag. Let's redefine a, and add a flag that tells Python to ignore the case of the match letters. This way, Python won't care whether the characters it is trying to match our uppercase or lowercase, it just matches them. So let's try this. Let's say y o u with a lowercase y. And also let's add the optional flag I was telling you about our e dot capital I enter.

Now let's see a dot group. So this time, we provided the string y o u u In all lowercase, although at the beginning of my string, we have y o written with a capital Y, the match method still returned a valid object because we specified the array that I flagged for ignoring the case as the third argument of the method. There are other optional flags as well. I just wanted to give you an example for now, because I find this particular flag very useful when working with regular expressions. Now, for example, let's say we don't have the substring while you at the beginning of the string, instead, we have it somewhere in the middle. So let's redefine our string, and I will delete you at the beginning of the string and place it somewhere in the middle, enter.

Now let's try to match this again. Let's see a and type of a. This time the match method returns a non typed value because the pattern we are searching for is not located at the beginning of the string. To search for a pattern across the entire string, we will use the search method. That's why this method is used more frequently than the match method. We will also start using regular expressions specific syntax, rather than just searching for an explicit substring.

Inside a larger string. The syntax for the search method is similar to the syntax we've seen so far for the match method. So let's see it. This is it right here, you can notice that it is very similar. Now let's open another Python interpreter session. And let's define another string.

Now, we will use a more complex string which actually represents an ARP entry from the ARP table of a network device. Don't worry if you don't know what an ARP entry really is. Its meaning is not relevant to us right now. However, the diversity of characters inside such a string is very useful for our purpose for the rest of this lecture. Let me define the string. Okay, so this is it right here.

Just like The match method search will return a match object if the pattern is found in the string and the non type value. If nothing is found, it's time to see it in action. Let me write this first. And then we will study the syntax. What was my mistake here? I haven't imported the our module.

So let me do it right now. And now let's redefine a again. Okay, so what language is this, you might ask? Well, this is the syntax of regular expressions and pattern matching. Don't worry, you'll get used to it. Now let's take this one bit at a time.

First, you can see that the arguments of the search methods are the following First, the pattern to be matched. And secondly, after the comma the variable which holds the string in which to search for a pattern match, no optional flags this time. Another thing here is the lowercase r, right here before the pattern. This means that the pattern should be treated like a raw string or roster It is useful to avoid conflicts between the way Python recognizes an escape sequence. And the way the R e module does it. We will talk more about escape sequences soon.

Now let's dissect the syntax for the pattern. First, notice the parentheses surrounding some of the symbols. Any pair of parentheses matches the regular expression in between them. So this form right here, and as defined by pythons official documentation, the parentheses indicate the start and the end of a group. If a match is found for the pattern inside the parentheses, then the contents of that group can be extracted using the group method applied to the match object, which in our case is called a. Now inside the first pair of parentheses, we have a dot a plus sign and the question mark.

Each of these symbols has a meaning and matches something in particular, in regular expression syntax. A dot represents any character except a newline character, the plus sign means that the previous expression, which in our case is just a dot, may repeat one or more times. So this group will look for at least one character, any character, except the newline character, doesn't matter if it is a letter or number or space, or a punctuation mark or whatever. Now remember that the plus sign, which as I said means one or more repetitions of the expression before it, and the asterisk, which means zero or more repetitions of the expression before it are both greedy, meaning they try to match as much text as possible. Let me show you what I mean. What match will Python find for the first group of parentheses having the question mark as it is right now?

To answer this, we will use the group method with one as an argument meaning the first group in the pattern, so let me type in a dot group of one. Now let's remove the question mark from the forest group. See, what does the search method return? So I'm going to delete this question mark right here. And now let's do a dot group of one again, notice the difference. In the second case, the plus sign was so greedy that it also included two of the three white spaces following it in the matching group.

So here we had 123, white spaces. And now we have one and two. But when we had the question mark up here, the match was made in a minimal fashion, matching as few characters as possible. Now we've seen the role of the question mark in regular expressions. This means that the choice of matching in a greedy or in a minimal fashion is totally yours. One more thing about this greedy behavior of the plus and asterisk signs, up to which point is the plus sign acting greedy.

In our example, well by looking at the initial string, we see that after the IP address right here, there are three white spaces i O. Ready set, and then the zero digit and the rest of the string. That's why inside the regular expressions pattern, I typed in a space after closing the parentheses. So you can see the space right here. My intention was to tell Python that I want to match a group until a space is encountered. When the question mark was inside the parentheses.

Python understood that it should match any character one or more times until the first space characters or cures so in a non greedy way, that's why the matching stopped right before the first space and all we got was just the IP address. When I didn't use the question mark, Python understood that it should match any character one or more times as greedy as possible until the last space character is encountered. That's why the matching stopped right before the third the last space character after the IP address since the matching has been done in A greedy way, this time, we got the IP address followed by two space characters. Again, that's because Python stopped matching right before the third space. So notice that for a correct match result, you should be very careful at both the greedy behavior and the characters following the group because they act as limits or borders of the pattern you want to match.

Now moving beyond the first group, we have the space we talked about, followed by a plus sign. This means that looking at the initial string, we are expecting one or more space characters after the IP address. We know there are actually three spaces there, but it is safer to use one space and the plus sign for future compatibility. Maybe you will use this regular expression inside the test automation suite and the output of some command will slightly change at some point in the future. And instead of three spaces, then your output will only have two spaces in that spot. You don't want to have spaces Yours into your scripts and tests caused by a fixed number of spaces defined in your regular expression pattern.

I've been through this. And trust me being somewhat flexible with your pattern matching in Python can save you from a lot of headaches. Now, let's look at the second group in the pattern, we have a backslash D inside the parentheses, backslash D is called a special sequence in Python, and it means any decimal digit, so any digit from zero to nine, Python will expect a single digit here. So let's verify this with the group method using a dot group of two. And indeed, we have the single digit we were expecting to match according to the second group, this is this zero right here. Next we have another sequence of spaces up to the first non space character, which is B.

So we have these spaces right here before be after the sequence. We have a group similar to the first group we previously discussed. at large, let's say we want this group, which will be group number three to match a substring consisting of both the MAC address this string right here, and the villain number. So this string right here, and the space between the two implicitly. For this, we use the same dot plus question mark syntax. But we should also specify the limit up to which we should consider this group, the border.

So that would be the first character or sequence of characters at which Python should stop matching this group. Of course, we could use the same syntax as before with a single space and the plus sign, but that wouldn't be fun, right? Instead, we can use something even more interesting. First, another special sequence backslash s, which matches any whitespace character, whether it is a space or tab or a newline character. Secondly, are used a number followed by a comma in between curly braces. This means Python should expect two or more occurrences of the pattern preceding the curly braces, or whitespace.

In our case, remember that not typing the comma inside the curly braces will tell Python that it should expect exactly two repetitions of the previous pattern, meaning the space, if we would match just a single repetition of a whitespace. Here, the matching would stop at that one space character in between the MAC address and the word VLAN. Instead, by specifying two or more white spaces, Python will know that it needs to stop the matching right after this two right here because that's where a lot of spaces reside. The final result for this group would be the substring. Starting at the up to the last two right here because that's where two or more white spaces are located. That's actually the border or limit we defined right here in the pattern.

Finally, the backslash w special sequence followed by an asterisk matches zero or more occurrences of any word character, meaning letters lowercase a to z, uppercase A to Z numbers zero to nine, and the underscore character. This means that after the two or more white spaces that we previously defined, we should match a wart character. And that is, of course, the letter L right here. Now it's time to check out all the groups we have defined inside the pattern. So we've already seen a group of one and a group of two. Let's see the third group, which is indeed the MAC address, and the VLAN number.

And finally, a group of four will return as we said, the letter L. Great. Also remember that a dot group without any arguments, and a dot group of zero, both return the same thing, the entire string. Another method we can use when matching patterns is the group's method, which returns all the matches Found in a given string in the form of a tupple, where each match is an element of that topple. So let's try this a dot groups open and close parentheses. And indeed We got the tupple I was talking about. I think this video is long enough.

Now, in the next one we will discuss two other very useful regular expression methods. Find all and sub

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.