The ^ is used if the required regex must be at the start

In [1]:
import re
In [2]:
beginsWithHelloRegex = re.compile(r'^Hello')
In [3]:
beginsWithHelloRegex.search('Hello there...General Kenobi')
Out[3]:
<_sre.SRE_Match object; span=(0, 5), match='Hello'>
In [4]:
beginsWithHelloRegex.search('And he said Hello') == None
Out[4]:
True

The $ is used when the regex must be at the end

In [5]:
endsWithWorldRegex = re.compile(r'World$')
In [6]:
endsWithWorldRegex.search('Hello World')
Out[6]:
<_sre.SRE_Match object; span=(6, 11), match='World'>
In [7]:
endsWithWorldRegex.search('This is a World string') == None
Out[7]:
True

^both$ means pattern must match the entire string.

In [8]:
allDigitsRegex = re.compile(r'^\d+$') #means the pattern starts with a one or more than one digit and ends with the pattern \d+.

having both ^ and $ means that the entire string must have \d+ pattern

In [9]:
allDigitsRegex.search('434343')
Out[9]:
<_sre.SRE_Match object; span=(0, 6), match='434343'>
In [10]:
allDigitsRegex.search('654534354654454654545465132486434867436574')
Out[10]:
<_sre.SRE_Match object; span=(0, 42), match='654534354654454654545465132486434867436574'>
In [11]:
allDigitsRegex.search('0')
Out[11]:
<_sre.SRE_Match object; span=(0, 1), match='0'>
In [12]:
allDigitsRegex.search('') == None
Out[12]:
True
In [13]:
allDigitsRegex.search('34343x43433') == None
Out[13]:
True



. stands for any character except for the newline.

In [14]:
atRegex = re.compile(r'.at')  #pattern is anything followed by at.
In [15]:
atRegex.findall('The cat in the flat sat on the hat mat')
Out[15]:
['cat', 'lat', 'sat', 'hat', 'mat']

Notice here that flat is not matched, instead it only took lat. That is because we only included one dot.

In [16]:
atRegex = re.compile(r'.{1,2}at') #means at preceded by one or two characters of anything
In [17]:
atRegex.findall('The cat in the flat sat on the hat mat')
Out[17]:
[' cat', 'flat', ' sat', ' hat', ' mat']

Notice how it also included the space too in front of some matches



.*

Dot=Star to match anything

. (dot) means any characters.

* (star) means zero or more

In [18]:
nameRegex = re.compile(r'First Name: (.*) Last Name: (.*)')
In [19]:
nameRegex.findall('First Name: S Last Name: Dahiwal')
Out[19]:
[('S', 'Dahiwal')]
In [20]:
nameRegex.findall('First Name: Sat yam Last Name: Dahiwal')
Out[20]:
[('Sat yam', 'Dahiwal')]

If we want to have non greedy match we can use .*?

In [21]:
serve = '<To serve humans> for dinner.>'
In [22]:
greedy = re.compile(r'<.*>')
In [23]:
greedy.findall(serve)
Out[23]:
['<To serve humans> for dinner.>']
In [24]:
nongreedy = re.compile(r'<.*?>')
In [25]:
nongreedy.findall(serve)  #It does the non-greedy match until it finds the closing angle bracket
Out[25]:
['<To serve humans>']


In [26]:
prime = 'Serve the public trust.\nProtect the innocent.\nUphold the law.'
In [27]:
print(prime)
Serve the public trust.
Protect the innocent.
Uphold the law.
In [28]:
dotStar = re.compile(r'.*')  #greedy
In [29]:
dotStar.search(prime)  #since it is greedy it will match as much as possible till it reaches a newline
Out[29]:
<_sre.SRE_Match object; span=(0, 23), match='Serve the public trust.'>
In [30]:
dotStar = re.compile(r'.*?')  #To see what happens if we do a non-greedy regex example for non-greedy
In [31]:
dotStar.search(prime)
Out[31]:
<_sre.SRE_Match object; span=(0, 0), match=''>

We can match everything including new line by passing another argument.

In [32]:
dotStar = re.compile(r'.*', re.DOTALL)
In [33]:
dotStar.search(prime)
Out[33]:
<_sre.SRE_Match object; span=(0, 61), match='Serve the public trust.\nProtect the innocent.\nU>
In [34]:
print(dotStar.search(prime).group())
Serve the public trust.
Protect the innocent.
Uphold the law.



In [35]:
vowelRegex = re.compile(r'[aeiou]')
In [36]:
vowelRegex.findall('Everything changes when you start to write something.')
Out[36]:
['e', 'i', 'a', 'e', 'e', 'o', 'u', 'a', 'o', 'i', 'e', 'o', 'e', 'i']

Above didn't include the capital 'E', it's returning the small vowels only.

We can make python do a case insensitive matching. I can tell it to ignore all casing by passing another argument in compile.

In [37]:
vowelRegex = re.compile(r'[aeiou]',re.IGNORECASE)
In [38]:
vowelRegex = re.compile(r'[aeiou]',re.I)

Above two lines of code are equivalent, i.e. they mean the same thing.

In [39]:
vowelRegex.findall('Everything changes when you start to write something.')
Out[39]:
['E', 'e', 'i', 'a', 'e', 'e', 'o', 'u', 'a', 'o', 'i', 'e', 'o', 'e', 'i']



-^ means the string must start with the pattern, $ means the string must end with the pattern. Both means the entire string must match the pattern.

-The . dot is a wildcard; it matches anything except newlines.

-Pass re.DOTALL as the second argument to re.compile() to make the . dot match newlines as well.

-Pass re.I as the second argument to re.compile() to make the matching case-insensitive.

creating a regex to find all the email addresses in given string.

In [40]:
regex = re.compile(r'(\w+@\w+\.\w+)')
In [41]:
regex.findall('The email is something@gmail.com, anotherExample@ex.ex, and the final one is sdhfsd@khdfks.fkjhdfh')
Out[41]:
['something@gmail.com', 'anotherExample@ex.ex', 'sdhfsd@khdfks.fkjhdfh']

Sometimes the email can contain another characters like . or _

We can include those characters too by creating our own character classes like following:

In [42]:
regex = re.compile(r'([\w\._]+@\w+\.\w+)')  #\. is used because . also means 'everything except newline
In [43]:
regex.findall('The email is some.thing@gmail.com, another_Example@ex.ex, and the final one is sdh._fs._d@khdfks.fkjhdfh')
Out[43]:
['some.thing@gmail.com', 'another_Example@ex.ex', 'sdh._fs._d@khdfks.fkjhdfh']