Phone number is like 444-444-4444

In [1]:
import re
phoneNumRegex = re.compile(r"\d\d\d-\d\d\d-\d\d\d\d")
matchobject = phoneNumRegex.search("my number is 434-343-4343")
matchobject.group()
Out[1]:
'434-343-4343'

But what if we wanted to get only the area code. For that we use parantheses to mark out the groups

In [2]:
phoneNumRegex = re.compile(r"(\d\d\d)-(\d\d\d-\d\d\d\d)")
In [3]:
matchobject = phoneNumRegex.search("my number is 434-343-4343")
In [4]:
matchobject.group()
Out[4]:
'434-343-4343'
In [5]:
matchobject.group(1)
Out[5]:
'434'
In [6]:
matchobject.group(2)
Out[6]:
'343-4343'

Calling group() or group(0) returns the full matching string, group(1) returns the groups 1's matching string, and so on.

If the paranthesis are part of the string we have to use the escape character.

In [7]:
phoneNumRegex = re.compile(r"\(\d\d\d\)-\(\d\d\d-\d\d\d\d\)")
In [8]:
matchobject = phoneNumRegex.search("my number is (434)-(343-4343)")   #the parantheses are added
In [9]:
matchobject.group()
Out[9]:
'(434)-(343-4343)'



The Pipe Character

The vertical bar above the enter key | is called the pipe character

What if we wanted to find all the words with a fixed prefix like batman batmobile which have bat as a prefix

In [10]:
batRegex = re.compile(r'bat(man|mobile|copter|cave|bat)')

The parantheses after the bat contains possible suffixes after the bat separated by the pipe character

In [11]:
matchobject = batRegex.search('batman rides his batcopter and batmobile to his batcave. nanananannnan batbat. lol')
In [12]:
matchobject.group()
Out[12]:
'batman'

If we want to find only the suffix of the first appearance we can do that in the following way

In [13]:
matchobject.group(1)
Out[13]:
'man'

And the above returned only the first element

Defining matchobject using findall function on regex object

In [14]:
matchobject = batRegex.findall('batman rides his batcopter and batmobile to his batcave. nanananannnan batbat. lol')
In [15]:
matchobject
Out[15]:
['man', 'copter', 'mobile', 'cave', 'bat']
In [16]:
for i in range(len(matchobject)):
    print("bat"+matchobject[i])
batman
batcopter
batmobile
batcave
batbat