Python Regex: A Complete Guide to the re Module
Python's built-in re module is one of the most useful tools in the standard library — once you understand the handful of core functions, you can parse text, validate inputs, extract data, and transform strings with precision. This guide covers everything you need to use Python regular expressions effectively: matching functions, groups, substitution, flags, compilation, and practical patterns for real tasks.
Raw Strings: Always Use r'...'
Before writing any Python regex pattern, get one habit right: always use raw strings (r'pattern'). Without the r prefix, Python interprets backslashes as escape sequences before the regex engine even sees them. For example,'\d' is just 'd' in a regular string, butr'\d' correctly passes \d to the regex engine as the digit shorthand.
Rule of thumb: always write r'\d+', never '\d+'.
re.match(), re.search(), re.fullmatch()
These three functions are the foundation. The key distinction is where in the string the match is attempted:
import re # re.match() — matches at the BEGINNING of the string only m = re.match(r'\d+', '123 apples') m.group() # '123' re.match(r'\d+', 'apples 123') # None — pattern not at start # re.search() — finds the FIRST match anywhere in the string m = re.search(r'\d+', 'apples 123 oranges 456') m.group() # '123' # re.fullmatch() — the ENTIRE string must match the pattern re.fullmatch(r'\d+', '123') # match object re.fullmatch(r'\d+', '123abc') # None — not a full match
| Function | Matches where? | Returns |
|---|---|---|
| re.match() | Start of string only | Match object or None |
| re.search() | Anywhere in string (first match) | Match object or None |
| re.fullmatch() | Entire string must match | Match object or None |
All three return a Match object on success (truthy), or None on failure. Always check for None before calling.group() — calling it on None raises an AttributeError.
re.findall() — All Matches at Once
re.findall() returns a list of all non-overlapping matches in the string. Its behavior depends on groups in your pattern:
import re
# re.findall() — returns a list of all non-overlapping matches
text = "Order #1042 total: $89.99, Order #1043 total: $120.00"
# Extract all order numbers
re.findall(r'#(\d+)', text)
# → ['1042', '1043']
# Extract all prices
re.findall(r'\$([\d.]+)', text)
# → ['89.99', '120.00']
# With no groups: returns list of full matches
re.findall(r'\d+', text)
# → ['1042', '89', '99', '1043', '120', '00']
# With multiple groups: returns list of tuples
re.findall(r'#(\d+).+?\$([\d.]+)', text)
# → [('1042', '89.99'), ('1043', '120.00')]Important: when your pattern has groups, findall returns the group contents, not the full match. If you have two groups, you get a list of tuples. This is the most common source of confusion for Python regex beginners.
re.finditer() — Matches with Position Info
Use re.finditer() when you need the match position or want to process each match individually without loading all results into memory at once:
import re
# re.finditer() — returns an iterator of Match objects
# Better than findall when you need match position or full match details
text = "cat sat on the mat"
for m in re.finditer(r'[cm]at', text):
print(f"Found '{m.group()}' at position {m.start()}-{m.end()}")
# Output:
# Found 'cat' at position 0-3
# Found 'mat' at position 15-18
# Access span: m.start(), m.end(), m.span()
# Access full match: m.group() or m.group(0)
# Access groups: m.group(1), m.group(2), etc.Groups: Capturing and Naming
Groups let you extract specific parts of a match. Use parentheses to create a capturing group, (?P<name>...) for a named group, and (?:...) for a non-capturing group when you need grouping without capture:
import re
# Named groups: (?P<name>pattern)
date_pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = date_pattern.search("Today is 2026-03-06")
m.group('year') # '2026'
m.group('month') # '03'
m.group('day') # '06'
m.groupdict() # {'year': '2026', 'month': '03', 'day': '06'}
# Numbered groups: group(1), group(2), ...
m2 = re.search(r'(\w+)@(\w+)\.(\w+)', 'user@example.com')
m2.group(1) # 'user'
m2.group(2) # 'example'
m2.group(3) # 'com'
# Non-capturing group: (?:...) — group but don't capture
re.findall(r'(?:https?://)([\w.]+)', 'https://example.com http://test.org')
# → ['example.com', 'test.org']Named groups ((?P<name>...)) make patterns significantly more readable and maintainable — especially for date/time parsing, log extraction, and any pattern with 3+ groups. Use m.groupdict()to get all named captures as a dictionary.
re.sub() — Find and Replace
re.sub(pattern, replacement, string) replaces all matches of the pattern with the replacement. The replacement can be a string with backreferences or a callable that receives the Match object:
import re
# re.sub() — replace matches with a replacement string
text = "Hello world foo"
# Normalize multiple spaces to single space
re.sub(r' +', ' ', text)
# → 'Hello world foo'
# Use backreferences in replacement: \1, \2 (or \g<name>)
re.sub(r'(\w+)@(\w+)', r'\1 [at] \2', 'user@example.com')
# → 'user [at] example.com'
# Use a function as replacement
def redact(m):
return '*' * len(m.group())
re.sub(r'\b\d{4}\b', redact, "Card: 4242 Expires: 0328")
# → 'Card: **** Expires: 0328'
# re.subn() — same as sub but returns (new_string, count)
result, count = re.subn(r'\d+', 'NUM', 'a1 b2 c3')
# result → 'aNUM bNUM cNUM', count → 3Flags
Flags modify how the regex engine interprets the pattern. Pass them as the third argument to re.match()/re.search()/etc., or include them in compiled patterns. The most important ones:
import re
# re.IGNORECASE (re.I) — case-insensitive matching
re.findall(r'python', 'Python PYTHON python', re.I)
# → ['Python', 'PYTHON', 'python']
# re.MULTILINE (re.M) — ^ and $ match start/end of each LINE
text = "first line\nsecond line\nthird line"
re.findall(r'^\w+', text, re.M)
# → ['first', 'second', 'third']
# re.DOTALL (re.S) — . matches newlines too
re.search(r'start.+end', 'start\nmiddle\nend', re.S).group()
# → 'start\nmiddle\nend'
# re.VERBOSE (re.X) — allow whitespace and comments in pattern
email_pattern = re.compile(r"""
^ # start of string
[\w.%+-]+ # username
@ # literal @
[\w.-]+ # domain name
\. # literal dot
[a-zA-Z]{2,} # TLD
$ # end of string
""", re.VERBOSE)
# Combine flags with |
re.findall(r'^hello', text, re.I | re.M)💡 re.VERBOSE is underused
The re.X / re.VERBOSE flag is one of the best ways to write maintainable regex. It allows whitespace and # comments inside your pattern. For any pattern longer than ~20 characters, consider using verbose mode with re.compile().
re.compile() for Reused Patterns
When you use the same pattern multiple times — in a loop, across multiple function calls — compile it once with re.compile(). The compiled pattern object has the same methods as the re module functions but avoids re-parsing the pattern on every call:
import re # re.compile() — compile a pattern for reuse (better performance in loops) pattern = re.compile(r'\b[A-Z][a-z]+\b') # capitalized words texts = ["Hello World", "foo Bar Baz", "no caps here"] results = [pattern.findall(t) for t in texts] # → [['Hello', 'World'], ['Bar', 'Baz'], []] # Compiled pattern objects have the same methods: # pattern.match(), pattern.search(), pattern.findall() # pattern.finditer(), pattern.sub(), pattern.split() # re.split() — split a string by a pattern re.split(r'[,;\s]+', "one,two; three four") # → ['one', 'two', 'three', 'four'] # re.escape() — escape special regex characters in a string user_input = "3.14 (approximately)" re.findall(re.escape(user_input), "Value is 3.14 (approximately) here") # → ['3.14 (approximately)']
Practical Python Regex Patterns
Here are battle-tested patterns for common Python text-processing tasks:
import re
# 1. Email validation
email_re = re.compile(r'^[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,}$')
email_re.match('user@example.com') # match
email_re.match('notanemail') # None
# 2. Extract all URLs from text
url_re = re.compile(r'https?://[\w./%-]+')
url_re.findall("Visit https://example.com or http://test.org/path")
# → ['https://example.com', 'http://test.org/path']
# 3. Validate and parse ISO date
date_re = re.compile(r'^(?P<y>\d{4})-(?P<m>0[1-9]|1[0-2])-(?P<d>0[1-9]|[12]\d|3[01])$')
m = date_re.match("2026-03-06")
m.groupdict() # {'y': '2026', 'm': '03', 'd': '06'}
# 4. Strip HTML tags
re.sub(r'<[^>]+>', '', '<p>Hello <b>world</b></p>')
# → 'Hello world'
# 5. Camel case to snake_case
def to_snake(name):
s1 = re.sub(r'(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
to_snake('myVariableName') # 'my_variable_name'
to_snake('HTMLParser') # 'h_t_m_l_parser'Common Mistakes
- Using
re.match()when you meanre.search(): The most common beginner error. Remember:matchonly checks the beginning of the string. Usesearchfor "find anywhere" behavior. - Not using raw strings:
'\d'is the letter 'd';r'\d'is the regex digit pattern. Always user''for regex patterns. - Calling
.group()without checking for None: If the pattern doesn't match, the function returnsNoneandNone.group()raisesAttributeError. Useif m := re.search(...)(Python 3.8+ walrus operator) or checkif m is not None. - Greedy vs. lazy quantifiers:
.*is greedy and matches as much as possible. Use.*?(lazy) to match as little as possible — important for extracting content between delimiters. - Forgetting to escape dots in patterns:
.in regex matches any character. To match a literal period, use\..
Generate Python regex patterns from plain English
Describe what you want to match — "extract email addresses", "validate ISO dates", "find all URLs" — and RegSQL generates the correct Python regex pattern with a full explanation of how it works.
✨ Try RegSQL Regex Generator Free →Python Regex Quick Reference
re.match(p, s)— match at start of stringre.search(p, s)— find first match anywherere.fullmatch(p, s)— entire string must matchre.findall(p, s)— list of all matchesre.finditer(p, s)— iterator of Match objectsre.sub(p, r, s)— replace all matchesre.split(p, s)— split string by patternre.compile(p)— compile pattern for reuse
The Python re module covers the full range of text-processing needs. Start with re.search() and re.findall() for most tasks, add named groups for readability, and compile patterns when performance matters.