import re
phoneRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
message = 'Remember this is a message with the first number as 4444-444-4444, 343-333-3333 and the final number is 333-222-3333'
phoneRegex.search(message)
phoneRegex.findall(message)
search() returns Match object
findall() returns a list
If the regular expression string (here r'\d\d\d-\d\d\d-\d\d\d\d') does not have more than one group, then the findall method just reuturns a list of strings. This is the behavious for regex objects that have zero or one groups in them.
Let's do an example with a regex object that have two or more groups in it.
phoneRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
phoneRegex.findall(message)
Instead of returning a list of strings, it returns a list of tuples
Notice that there is no - in the tuples above, so if the - is needed to appear, it can be put in one of the groups, and it will then appear in one of the element of tuple.
If the entire number is needed as one string, the whole pattern can be one group
phoneRegex = re.compile(r'((\d\d\d)-(\d\d\d-\d\d\d\d))')
phoneRegex.findall(message)
The order of group is decided by opening parentheses, left to right
\d is a character class - represent a digit - 1 to 9
digitRegex = re.compile(r'(0|1|2|3|4|5|6|7|8|9)')
digitRegex = re.compile(r'\d')
Above two lines of code are equivalent
Standard Character Class | Represents |
---|---|
\d | Any numeric digit from 0 to 9 |
\D | Any character that is not a numeric digit from 0 to 9 |
\w | Any letter, numeric digit, or the underscore character.(Think of this as matching "word" character.) |
\W | Any Character that is not a letter, numeric digit, or the underscore character. |
\s | Any space, tab, or newline character.(Think of this as matching "space" characters.) |
\S | Any character that is not a space, tab, or newline |
lyrics = 'There are 12 soldiers, 500000 peoples, 44 apples, 22 videogames, 27 trees, and 3232343435 books'
regex = re.compile(r'\d+\s\w+')
regex.search(lyrics)
regex.findall(lyrics)
vowelRegex = re.compile(r'[aeiou]') # same as r'(a|e|i|o|u)'
The left one is more useful since we can use ranges in it, like following
small_letterRegex = re.compile(r'[a-z]')
all_letters_from_a_to_f_Regex = re.compile(r'[a-fA-F]')
vowelRegex = re.compile(r'[aeiouAEIOU]')
vowelRegex.findall('The title for Avengers four is Avengers:0.5')
vowelRegex = re.compile(r'[aeiouAEIOU]{2}')
vowelRegex.findall('The title for Avengers four is Avengers:0.5')
`
If the caret symbol ^ is added at the start of the character class
vowelRegex = re.compile(r'[^aeiouAEIOU]')
Now above will match every character that isn't in that character class.
In above case, it's consonants
vowelRegex.findall('The title for Avengers four is Avengers:0.5')
Notice that there are not only letters, but all the punctuations and spaces as well i.e. any character that is not aeiou or AEIOU