The ^ is used if the required regex must be at the start
import re
beginsWithHelloRegex = re.compile(r'^Hello')
beginsWithHelloRegex.search('Hello there...General Kenobi')
beginsWithHelloRegex.search('And he said Hello') == None
The $ is used when the regex must be at the end
endsWithWorldRegex = re.compile(r'World$')
endsWithWorldRegex.search('Hello World')
endsWithWorldRegex.search('This is a World string') == None
^both$ means pattern must match the entire string.
allDigitsRegex = re.compile(r'^\d+$') #means the pattern starts with a one or more than one digit and ends with the pattern \d+.
having both ^ and $ means that the entire string must have \d+ pattern
allDigitsRegex.search('434343')
allDigitsRegex.search('654534354654454654545465132486434867436574')
allDigitsRegex.search('0')
allDigitsRegex.search('') == None
allDigitsRegex.search('34343x43433') == None
atRegex = re.compile(r'.at') #pattern is anything followed by at.
atRegex.findall('The cat in the flat sat on the hat mat')
Notice here that flat is not matched, instead it only took lat. That is because we only included one dot.
atRegex = re.compile(r'.{1,2}at') #means at preceded by one or two characters of anything
atRegex.findall('The cat in the flat sat on the hat mat')
Notice how it also included the space too in front of some matches
. (dot) means any characters.
* (star) means zero or more
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
nameRegex.findall('First Name: S Last Name: Dahiwal')
nameRegex.findall('First Name: Sat yam Last Name: Dahiwal')
If we want to have non greedy match we can use .*?
serve = '<To serve humans> for dinner.>'
greedy = re.compile(r'<.*>')
greedy.findall(serve)
nongreedy = re.compile(r'<.*?>')
nongreedy.findall(serve) #It does the non-greedy match until it finds the closing angle bracket
prime = 'Serve the public trust.\nProtect the innocent.\nUphold the law.'
print(prime)
dotStar = re.compile(r'.*') #greedy
dotStar.search(prime) #since it is greedy it will match as much as possible till it reaches a newline
dotStar = re.compile(r'.*?') #To see what happens if we do a non-greedy regex example for non-greedy
dotStar.search(prime)
We can match everything including new line by passing another argument.
dotStar = re.compile(r'.*', re.DOTALL)
dotStar.search(prime)
print(dotStar.search(prime).group())
vowelRegex = re.compile(r'[aeiou]')
vowelRegex.findall('Everything changes when you start to write something.')
Above didn't include the capital 'E', it's returning the small vowels only.
We can make python do a case insensitive matching. I can tell it to ignore all casing by passing another argument in compile.
vowelRegex = re.compile(r'[aeiou]',re.IGNORECASE)
vowelRegex = re.compile(r'[aeiou]',re.I)
Above two lines of code are equivalent, i.e. they mean the same thing.
vowelRegex.findall('Everything changes when you start to write something.')
regex = re.compile(r'(\w+@\w+\.\w+)')
regex.findall('The email is something@gmail.com, anotherExample@ex.ex, and the final one is sdhfsd@khdfks.fkjhdfh')
Sometimes the email can contain another characters like . or _
We can include those characters too by creating our own character classes like following:
regex = re.compile(r'([\w\._]+@\w+\.\w+)') #\. is used because . also means 'everything except newline
regex.findall('The email is some.thing@gmail.com, another_Example@ex.ex, and the final one is sdh._fs._d@khdfks.fkjhdfh')