Regular expression allow you to specify a pattern of text to search for.
Regular expressions are sort of a simplified mini language for specifying text pattern.
One example of text pattern is a phone number.
For US and Canada the phone number is like a following
414-555-0000
The first 3 digits are area code.
We as a human know that the above number is phone number while 414,555,0000 is not.
def isPhoneNumber(text): #determining if the phone number is us, canadian style or not.
if len(text) != 12:
print("not 12 digits")
return False #not phone number sized
for i in range(0,3):
if not text[i].isdecimal(): #if not a decimal/number character we can use
print("first 3 characters are not numbers")
return False
if text[3] != "-":
print("missing dash after the area code")
return False #missing dash
for i in range(4,7):
if not text[i].isdecimal():
print("three characters in the middle are not numbers")
return False
if text[7] != "-":
print("missing second dash")
return False
for i in range(8,12):
if not text[i].isdecimal():
print("last four characters are not numbers")
return False #missing the last 4 digits
return True
Take a note that a function does not execute anything after it encounters a return statement.
isPhoneNumber("414-555-0000")
isPhoneNumber("ldskfjksdjfl")
isPhoneNumber("4149555-0000")
isPhoneNumber("414-555-000")
isPhoneNumber("414-555-00a0")
But what if we have a message and want to find out if the phone number exists in that message
We do that the following way
First we will redefine our function so that it returns only true or false and does not print any message at all on the screen.
def isPhoneNumber(text):
if len(text) != 12:
return False
for i in range(0,3):
if not text[i].isdecimal():
return False
if text[3] != "-":
return False
for i in range(4,7):
if not text[i].isdecimal():
return False
if text[7] != "-":
return False
for i in range(8,12):
if not text[i].isdecimal():
return False
return True
Now our number finding function.
def numberFinder(message):
foundNumber = False
for i in range(len(message)):
chunk = message[i:i+12]
if isPhoneNumber(chunk):
print("The number is " + chunk)
foundNumber = True
if foundNumber == False:
print("no numbers found in the message")
return foundNumber
numberFinder("Hey, you can call me on 414-444-5757 or if that number is not reachable you can use my other number 423-333-5555.")
numberFinder("hi, 433-4343-4343")
numberFinder("423-333-3333")
someMessage = "hello my number is 433-433-4343. Oops not that one, my real number is 333-444-2222. Ok bye now"
numberFinder(someMessage)
THAT IS A LOT OF CODE!!
import re
Regex strings oftern use \ backslashes (like \d), so they are often raw strings: r'\d'
Calling the re.compile() function to create a regex object.
\d is the regex for numeric digit character.
phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
Call the regex object's search() method to create the match object
someMessage = "hello my number is 433-433-4343. Oops not that one, my real number is 333-444-2222. Ok bye now"
matchObjBlaBla = phoneNumRegex.search(someMessage)
match objects have a method called group()
print(matchObjBlaBla.group()) #to just print out the found element.
print(matchObjBlaBla)
It gave out only one occurance, for all occurences, we use findall() method on regex object.
matchObjBlaBla = phoneNumRegex.findall(someMessage) #this will return a list value
print(matchObjBlaBla.group())
print(matchObjBlaBla)