sub() method

In [1]:
import re

Search and Findall are like word processor features where they find the matching characters.
However, Word processors also have find and replace feature

We can do find and replace with regular expressions as well

In [2]:
namesRegex = re.compile(r'Agent \w+')
In [3]:
namesRegex.findall('Agent Alice gave the secret files to Agent Bob')
Out[3]:
['Agent Alice', 'Agent Bob']

We can replace the agent names with a sub() method sort of like substitution

In [4]:
namesRegex.sub('[CLASSIFIED]', 'Agent Alice gave the secret files to Agent Bob')
Out[4]:
'[CLASSIFIED] gave the secret files to [CLASSIFIED]'

Let's say we want some part of the original string


In [5]:
namesRegex = re.compile(r'Agent (\w)\w+')   #first character of second word is added in a group.


In [6]:
namesRegex.findall('Agent Alice gave the secret files to Agent Bob')  #findall() just returns the groups
Out[6]:
['A', 'B']


In [7]:
namesRegex.sub(r'Agent \1****', 'Agent Alice gave the files to Agent Bob')    #we have used the raw string
Out[7]:
'Agent A**** gave the files to Agent B****'

\1 means inside that match, whatever was first group

second group would be \2, third group would be \3, and so on



Verbose Mode with re.VERBOSE

In Verbose Mode, whitespace does not reflect actual pattern that we want to match. That means we can use triple quotes to make a multiline string. The newlines won't be a part of the pattern that we are looking for.

In [8]:
verboseRegex = re.compile(r'''
(\d\d\d-)|          # Area Code (without parentheses) and dash
(\(\d\d\d\))        # -or-Area Code (with parentheses) and no dash
\d\d\d              # first 3 digits
-                   # second dash
\d\d\d\d'           # last 4 digits
\sx\d{2,4}          # Exetension like x1234''',re.VERBOSE)
#remeber the comma , after the arguments


We have seen re.I and re.DOTALL but if we wanted to use them both, we can do the following way.

In [9]:
regex = re.compile(r'\d', re.IGNORECASE | re.DOTALL | re.VERBOSE) #We can use the bitwise operator |

The | we used above is just for the second argument in the re.compile object. This type of programming with | is sort of weird and old fashioned now.



-The sub() regex method will substitute matches with some other text.

-Using \1,\2, and so will substitute group 1,2, etc in the regex pattern.

-Passing re.VERBOSE lets you add whitespace and comments to the regex string passed to re.compile().

-If you want to pass multiple arguments(re.DOTALL, re.IGNORECASE, re.VERBOSE), combine them with | bitwise operator.