---
title: Python Regular Expressions - Python Cheatsheet
description: A regular expression (shortened as regex) is a sequence of characters that specifies a search pattern in text and used by string-searching algorithms.
---
Regular Expressions
Regular expressions
A regular expression (shortened as regex [...]) is a sequence of characters that specifies a search pattern in text. [...] used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
1. Import the regex module with `import re`.
2. Create a Regex object with the `re.compile()` function. (Remember to use a raw string.)
3. Pass the string you want to search into the Regex object’s `search()` method. This returns a `Match` object.
4. Call the Match object’s `group()` method to return a string of the actual matched text.
All the regex functions in Python are in the re module:
```python
>>> import re
```
## Regex symbols
| Symbol | Matches |
| ------------------------ | ------------------------------------------------------ |
| `?` | zero or one of the preceding group. |
| `*` | zero or more of the preceding group. |
| `+` | one or more of the preceding group. |
| `{n}` | exactly n of the preceding group. |
| `{n,}` | n or more of the preceding group. |
| `{,m}` | 0 to m of the preceding group. |
| `{n,m}` | at least n and at most m of the preceding p. |
| `{n,m}?` or `*?` or `+?` | performs a non-greedy match of the preceding p. |
| `^spam` | means the string must begin with spam. |
| `spam$` | means the string must end with spam. |
| `.` | any character, except newline characters. |
| `\d`, `\w`, and `\s` | a digit, word, or space character, respectively. |
| `\D`, `\W`, and `\S` | anything except a digit, word, or space, respectively. |
| `[abc]` | any character between the brackets (such as a, b, ). |
| `[^abc]` | any character that isn’t between the brackets. |
## Matching regex objects
```python
>>> phone_num_regex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> mo = phone_num_regex.search('My number is 415-555-4242.')
>>> print(f'Phone number found: {mo.group()}')
# Phone number found: 415-555-4242
```
## Grouping with parentheses
```python
>>> phone_num_regex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = phone_num_regex.search('My number is 415-555-4242.')
>>> mo.group(1)
# '415'
>>> mo.group(2)
# '555-4242'
>>> mo.group(0)
# '415-555-4242'
>>> mo.group()
# '415-555-4242'
```
To retrieve all the groups at once use the `groups()` method:
```python
>>> mo.groups()
('415', '555-4242')
>>> area_code, main_number = mo.groups()
>>> print(area_code)
415
>>> print(main_number)
555-4242
```
## Multiple groups with Pipe
You can use the `|` character anywhere you want to match one of many expressions.
```python
>>> hero_regex = re.compile (r'Batman|Tina Fey')
>>> mo1 = hero_regex.search('Batman and Tina Fey.')
>>> mo1.group()
# 'Batman'
>>> mo2 = hero_regex.search('Tina Fey and Batman.')
>>> mo2.group()
# 'Tina Fey'
```
You can also use the pipe to match one of several patterns as part of your regex:
```python
>>> bat_regex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = bat_regex.search('Batmobile lost a wheel')
>>> mo.group()
# 'Batmobile'
>>> mo.group(1)
# 'mobile'
```
## Optional matching with the Question Mark
The `?` character flags the group that precedes it as an optional part of the pattern.
```python
>>> bat_regex = re.compile(r'Bat(wo)?man')
>>> mo1 = bat_regex.search('The Adventures of Batman')
>>> mo1.group()
# 'Batman'
>>> mo2 = bat_regex.search('The Adventures of Batwoman')
>>> mo2.group()
# 'Batwoman'
```
## Matching zero or more with the Star
The `*` (star or asterisk) means “match zero or more”. The group that precedes the star can occur any number of times in the text.
```python
>>> bat_regex = re.compile(r'Bat(wo)*man')
>>> mo1 = bat_regex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'
>>> mo2 = bat_regex.search('The Adventures of Batwoman')
>>> mo2.group()
'Batwoman'
>>> mo3 = bat_regex.search('The Adventures of Batwowowowoman')
>>> mo3.group()
'Batwowowowoman'
```
## Matching one or more with the Plus
The `+` (or plus) _means match one or more_. The group preceding a plus must appear at least once:
```python
>>> bat_regex = re.compile(r'Bat(wo)+man')
>>> mo1 = bat_regex.search('The Adventures of Batwoman')
>>> mo1.group()
# 'Batwoman'
>>> mo2 = bat_regex.search('The Adventures of Batwowowowoman')
>>> mo2.group()
# 'Batwowowowoman'
>>> mo3 = bat_regex.search('The Adventures of Batman')
>>> mo3 is None
# True
```
## Matching specific repetitions with Curly Brackets
If you have a group that you want to repeat a specific number of times, follow the group in your regex with a number in curly brackets:
```python
>>> ha_regex = re.compile(r'(Ha){3}')
>>> mo1 = ha_regex.search('HaHaHa')
>>> mo1.group()
# 'HaHaHa'
>>> mo2 = ha_regex.search('Ha')
>>> mo2 is None
# True
```
Instead of one number, you can specify a range with minimum and a maximum in between the curly brackets. For example, the regex (Ha){3,5} will match 'HaHaHa', 'HaHaHaHa', and 'HaHaHaHaHa'.
```python
>>> ha_regex = re.compile(r'(Ha){2,3}')
>>> mo1 = ha_regex.search('HaHaHaHa')
>>> mo1.group()
# 'HaHaHa'
```
## Greedy and non-greedy matching
Python’s regular expressions are greedy by default: in ambiguous situations they will match the longest string possible. The non-greedy version of the curly brackets, which matches the shortest string possible, has the closing curly bracket followed by a question mark.
```python
>>> greedy_ha_regex = re.compile(r'(Ha){3,5}')
>>> mo1 = greedy_ha_regex.search('HaHaHaHaHa')
>>> mo1.group()
# 'HaHaHaHaHa'
>>> non_greedy_ha_regex = re.compile(r'(Ha){3,5}?')
>>> mo2 = non_greedy_ha_regex.search('HaHaHaHaHa')
>>> mo2.group()
# 'HaHaHa'
```
## The findall() method
The `findall()` method will return the strings of every match in the searched string.
```python
>>> phone_num_regex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # has no groups
>>> phone_num_regex.findall('Cell: 415-555-9999 Work: 212-555-0000')
# ['415-555-9999', '212-555-0000']
```
## Making your own character classes
You can define your own character class using square brackets. For example, the character class _[aeiouAEIOU]_ will match any vowel, both lowercase and uppercase.
```python
>>> vowel_regex = re.compile(r'[aeiouAEIOU]')
>>> vowel_regex.findall('Robocop eats baby food. BABY FOOD.')
# ['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']
```
You can also include ranges of letters or numbers by using a hyphen. For example, the character class _[a-zA-Z0-9]_ will match all lowercase letters, uppercase letters, and numbers.
By placing a caret character (`^`) just after the character class’s opening bracket, you can make a negative character class that will match all the characters that are not in the character class:
```python
>>> consonant_regex = re.compile(r'[^aeiouAEIOU]')
>>> consonant_regex.findall('Robocop eats baby food. BABY FOOD.')
# ['R', 'b', 'c', 'p', ' ', 't', 's', ' ', 'b', 'b', 'y', ' ', 'f', 'd', '.', '
# ', 'B', 'B', 'Y', ' ', 'F', 'D', '.']
```
## The Caret and Dollar sign characters
- You can also use the caret symbol `^` at the start of a regex to indicate that a match must occur at the beginning of the searched text.
- Likewise, you can put a dollar sign `$` at the end of the regex to indicate the string must end with this regex pattern.
- And you can use the `^` and `$` together to indicate that the entire string must match the regex.
The `r'^Hello`' regular expression string matches strings that begin with 'Hello':
```python
>>> begins_with_hello = re.compile(r'^Hello')
>>> begins_with_hello.search('Hello world!')
# <_sre.SRE_Match object; span=(0, 5), match='Hello'>
>>> begins_with_hello.search('He said hello.') is None
# True
```
The `r'\d\$'` regular expression string matches strings that end with a numeric character from 0 to 9:
```python
>>> whole_string_is_num = re.compile(r'^\d+$')
>>> whole_string_is_num.search('1234567890')
# <_sre.SRE_Match object; span=(0, 10), match='1234567890'>
>>> whole_string_is_num.search('12345xyz67890') is None
# True
>>> whole_string_is_num.search('12 34567890') is None
# True
```
## The Wildcard character
The `.` (or dot) character in a regular expression will match any character except for a newline:
```python
>>> at_regex = re.compile(r'.at')
>>> at_regex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']
```
## Matching everything with Dot-Star
```python
>>> name_regex = re.compile(r'First Name: (.*) Last Name: (.*)')
>>> mo = name_regex.search('First Name: Al Last Name: Sweigart')
>>> mo.group(1)
# 'Al'
>>> mo.group(2)
'Sweigart'
```
The `.*` uses greedy mode: It will always try to match as much text as possible. To match any and all text in a non-greedy fashion, use the dot, star, and question mark (`.*?`). The question mark tells Python to match in a non-greedy way:
```python
>>> non_greedy_regex = re.compile(r'<.*?>')
>>> mo = non_greedy_regex.search(' for dinner.>')
>>> mo.group()
# ''
>>> greedy_regex = re.compile(r'<.*>')
>>> mo = greedy_regex.search(' for dinner.>')
>>> mo.group()
# ' for dinner.>'
```
## Matching newlines with the Dot character
The dot-star will match everything except a newline. By passing `re.DOTALL` as the second argument to `re.compile()`, you can make the dot character match all characters, including the newline character:
```python
>>> no_newline_regex = re.compile('.*')
>>> no_newline_regex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group()
# 'Serve the public trust.'
>>> newline_regex = re.compile('.*', re.DOTALL)
>>> newline_regex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group()
# 'Serve the public trust.\nProtect the innocent.\nUphold the law.'
```
## Case-Insensitive matching
To make your regex case-insensitive, you can pass `re.IGNORECASE` or `re.I` as a second argument to `re.compile()`:
```python
>>> robocop = re.compile(r'robocop', re.I)
>>> robocop.search('Robocop is part man, part machine, all cop.').group()
# 'Robocop'
>>> robocop.search('ROBOCOP protects the innocent.').group()
# 'ROBOCOP'
>>> robocop.search('Al, why does your programming book talk about robocop so much?').group()
# 'robocop'
```
## Substituting strings with the sub() method
The `sub()` method for Regex objects is passed two arguments:
1. The first argument is a string to replace any matches.
1. The second is the string for the regular expression.
The `sub()` method returns a string with the substitutions applied:
```python
>>> names_regex = re.compile(r'Agent \w+')
>>> names_regex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
# 'CENSORED gave the secret documents to CENSORED.'
```
## Managing complex Regexes
To tell the `re.compile()` function to ignore whitespace and comments inside the regular expression string, “verbose mode” can be enabled by passing the variable `re.VERBOSE` as the second argument to `re.compile()`.
Now instead of a hard-to-read regular expression like this:
```python
phone_regex = re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}(\s*(ext|x|ext.)\s*\d{2,5})?)')
```
you can spread the regular expression over multiple lines with comments like this:
```python
phone_regex = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator
\d{3} # first 3 digits
(\s|-|\.) # separator
\d{4} # last 4 digits
(\s*(ext|x|ext.)\s*\d{2,5})? # extension
)''', re.VERBOSE)
```