If we want to match a specific number of repetations of the group.
import re
batRegex = re.compile(r'Bat(wo)?man') #pattern I want is r'Batman|Batwoman', but we made it bit shorter.
mo = batRegex.search('The adventures of Batman')
mo.group()
mo = batRegex.search('The adventures of Batwoman')
mo.group()
mo = batRegex.search('The adventures of Batwowowowowoman')
mo == None
In the examples of phone number, let's say the message we have may or may not contain the area code. In that case we use following to find the phone number
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo = phoneRegex.search('This is my number 333-2323')
mo.group()
mo = phoneRegex.search('This is my number 444-333-2323')
mo.group()
We can escape the question mark with a backslash infront of it like \?
batRegex = re.compile(r'Bat(wo)*man')
batRegex.search('The adventures of Batman')
batRegex.search('The adventures of Batwoman')
batRegex.search('The adventures of Batwowowowowowowowowowowowowoman')
The character can be escaped by using escape character(/) before it. Like \
batRegex = re.compile(r'Bat(wo)+man') #wo group is required to appear one or more times. i.e. it must appear atleast once.
batRegex.search('The adventures of Batman') == None
batRegex.search('The adventures of Batwoman')
+ can be escaped with using escape character \. like \+
Let's take an example which has use for the escape character
regex = re.compile(r'(\+\*\?)+') # It creates regex for +*? and the +*? can appear one or more than one time
regex.search('I learned about +*? regex syntax')
regex.search('I learned about +*?+*?+*?+*? regex syntax')
If we want a pretty output, we can create a matchobject and use the group() method on it.
mo = regex.search('I learned about +*?+*?+*?+*? regex syntax')
mo.group()
What if we want to match with a specific number, like hahaha has three ha.
haRegex = re.compile(r'(Ha){3}')
haRegex.search('The joke was so funny he said HaHaHa')
Above we can see that the string has matched the regular expression.
phoneRegex = re.compile(r'((\d\d\d-)?\d\d\d-\d\d\d\d(,)?){3}')
Above code creates a regex object which finds three phones with or without area code and the numbers can be separated by comma or not.
phoneRegex.search('The numbers are 333-3333,333-333-3333,111-3333')
Instead of using {3}, the other way was like following
phoneRegex = re.compile(r'((\d\d\d-)?\d\d\d-\d\d\d\d(,)?)((\d\d\d-)?\d\d\d-\d\d\d\d(,)?)((\d\d\d-)?\d\d\d-\d\d\d\d(,)?)')
Which gets confusing.
haRegex = re.compile(r'(Ha){2,5}')
haRegex.search('everyone on hearing the joke said HaHa')
haRegex.search('everyone on hearing the joke said HaHaHaHaHaHa')
Above only matched to the first five Ha
{,y} means we can have zero to y matches, and {x,} means we can have x or more that x matches (unbound case).
By default, regular expressions in python do greedy matches.
This means that they try to matche the longest possible string that matches the pattern
digitRegex = re.compile(r'(\d){3,5}')
digitRegex.search('123456789')
If we want to do non greedy match, specify ? after the curly brace.
digitRegex = re.compile(r'(\d){3,5}?')
digitRegex.search('123456789')