March 6, 2026·9 min read·RegexPythonTutorial

Python Regex: A Complete Guide to the `re` Module

Python's built-in re module is one of the most useful tools in the standard library — once you understand the handful of core functions, you can parse text, validate inputs, extract data, and transform strings with precision. This guide covers everything you need to use Python regular expressions effectively: matching functions, groups, substitution, flags, compilation, and practical patterns for real tasks.

Raw Strings: Always Use `r'...'`

Before writing any Python regex pattern, get one habit right: always use raw strings (r'pattern'). Without the r prefix, Python interprets backslashes as escape sequences before the regex engine even sees them. For example,'\d' is just 'd' in a regular string, butr'\d' correctly passes \d to the regex engine as the digit shorthand.

Rule of thumb: always write r'\d+', never '\d+'.

`re.match()`, `re.search()`, `re.fullmatch()`

These three functions are the foundation. The key distinction is where in the string the match is attempted:

import re

# re.match() — matches at the BEGINNING of the string only
m = re.match(r'\d+', '123 apples')
m.group()   # '123'

re.match(r'\d+', 'apples 123')  # None — pattern not at start

# re.search() — finds the FIRST match anywhere in the string
m = re.search(r'\d+', 'apples 123 oranges 456')
m.group()   # '123'

# re.fullmatch() — the ENTIRE string must match the pattern
re.fullmatch(r'\d+', '123')      # match object
re.fullmatch(r'\d+', '123abc')   # None — not a full match

Function	Matches where?	Returns
re.match()	Start of string only	Match object or None
re.search()	Anywhere in string (first match)	Match object or None
re.fullmatch()	Entire string must match	Match object or None

All three return a Match object on success (truthy), or None on failure. Always check for None before calling.group() — calling it on None raises an AttributeError.

`re.findall()` — All Matches at Once

re.findall() returns a list of all non-overlapping matches in the string. Its behavior depends on groups in your pattern:

import re

# re.findall() — returns a list of all non-overlapping matches
text = "Order #1042 total: $89.99, Order #1043 total: $120.00"

# Extract all order numbers
re.findall(r'#(\d+)', text)
# → ['1042', '1043']

# Extract all prices
re.findall(r'\$([\d.]+)', text)
# → ['89.99', '120.00']

# With no groups: returns list of full matches
re.findall(r'\d+', text)
# → ['1042', '89', '99', '1043', '120', '00']

# With multiple groups: returns list of tuples
re.findall(r'#(\d+).+?\$([\d.]+)', text)
# → [('1042', '89.99'), ('1043', '120.00')]

Important: when your pattern has groups, findall returns the group contents, not the full match. If you have two groups, you get a list of tuples. This is the most common source of confusion for Python regex beginners.

`re.finditer()` — Matches with Position Info

Use re.finditer() when you need the match position or want to process each match individually without loading all results into memory at once:

import re

# re.finditer() — returns an iterator of Match objects
# Better than findall when you need match position or full match details
text = "cat sat on the mat"

for m in re.finditer(r'[cm]at', text):
    print(f"Found '{m.group()}' at position {m.start()}-{m.end()}")

# Output:
# Found 'cat' at position 0-3
# Found 'mat' at position 15-18

# Access span: m.start(), m.end(), m.span()
# Access full match: m.group() or m.group(0)
# Access groups: m.group(1), m.group(2), etc.

Groups: Capturing and Naming

Groups let you extract specific parts of a match. Use parentheses to create a capturing group, (?P<name>...) for a named group, and (?:...) for a non-capturing group when you need grouping without capture:

import re

# Named groups: (?P<name>pattern)
date_pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = date_pattern.search("Today is 2026-03-06")

m.group('year')   # '2026'
m.group('month')  # '03'
m.group('day')    # '06'
m.groupdict()     # {'year': '2026', 'month': '03', 'day': '06'}

# Numbered groups: group(1), group(2), ...
m2 = re.search(r'(\w+)@(\w+)\.(\w+)', 'user@example.com')
m2.group(1)  # 'user'
m2.group(2)  # 'example'
m2.group(3)  # 'com'

# Non-capturing group: (?:...) — group but don't capture
re.findall(r'(?:https?://)([\w.]+)', 'https://example.com http://test.org')
# → ['example.com', 'test.org']

Named groups ((?P<name>...)) make patterns significantly more readable and maintainable — especially for date/time parsing, log extraction, and any pattern with 3+ groups. Use m.groupdict()to get all named captures as a dictionary.

`re.sub()` — Find and Replace

re.sub(pattern, replacement, string) replaces all matches of the pattern with the replacement. The replacement can be a string with backreferences or a callable that receives the Match object:

import re

# re.sub() — replace matches with a replacement string
text = "Hello   world   foo"

# Normalize multiple spaces to single space
re.sub(r' +', ' ', text)
# → 'Hello world foo'

# Use backreferences in replacement: \1, \2 (or \g<name>)
re.sub(r'(\w+)@(\w+)', r'\1 [at] \2', 'user@example.com')
# → 'user [at] example.com'

# Use a function as replacement
def redact(m):
    return '*' * len(m.group())

re.sub(r'\b\d{4}\b', redact, "Card: 4242 Expires: 0328")
# → 'Card: **** Expires: 0328'

# re.subn() — same as sub but returns (new_string, count)
result, count = re.subn(r'\d+', 'NUM', 'a1 b2 c3')
# result → 'aNUM bNUM cNUM', count → 3

Flags

Flags modify how the regex engine interprets the pattern. Pass them as the third argument to re.match()/re.search()/etc., or include them in compiled patterns. The most important ones:

import re

# re.IGNORECASE (re.I) — case-insensitive matching
re.findall(r'python', 'Python PYTHON python', re.I)
# → ['Python', 'PYTHON', 'python']

# re.MULTILINE (re.M) — ^ and $ match start/end of each LINE
text = "first line\nsecond line\nthird line"
re.findall(r'^\w+', text, re.M)
# → ['first', 'second', 'third']

# re.DOTALL (re.S) — . matches newlines too
re.search(r'start.+end', 'start\nmiddle\nend', re.S).group()
# → 'start\nmiddle\nend'

# re.VERBOSE (re.X) — allow whitespace and comments in pattern
email_pattern = re.compile(r"""
    ^                   # start of string
    [\w.%+-]+          # username
    @                   # literal @
    [\w.-]+            # domain name
    \.                 # literal dot
    [a-zA-Z]{2,}        # TLD
    $                   # end of string
""", re.VERBOSE)

# Combine flags with |
re.findall(r'^hello', text, re.I | re.M)

💡 re.VERBOSE is underused

The re.X / re.VERBOSE flag is one of the best ways to write maintainable regex. It allows whitespace and # comments inside your pattern. For any pattern longer than ~20 characters, consider using verbose mode with re.compile().

`re.compile()` for Reused Patterns

When you use the same pattern multiple times — in a loop, across multiple function calls — compile it once with re.compile(). The compiled pattern object has the same methods as the re module functions but avoids re-parsing the pattern on every call:

import re

# re.compile() — compile a pattern for reuse (better performance in loops)
pattern = re.compile(r'\b[A-Z][a-z]+\b')  # capitalized words

texts = ["Hello World", "foo Bar Baz", "no caps here"]
results = [pattern.findall(t) for t in texts]
# → [['Hello', 'World'], ['Bar', 'Baz'], []]

# Compiled pattern objects have the same methods:
# pattern.match(), pattern.search(), pattern.findall()
# pattern.finditer(), pattern.sub(), pattern.split()

# re.split() — split a string by a pattern
re.split(r'[,;\s]+', "one,two; three  four")
# → ['one', 'two', 'three', 'four']

# re.escape() — escape special regex characters in a string
user_input = "3.14 (approximately)"
re.findall(re.escape(user_input), "Value is 3.14 (approximately) here")
# → ['3.14 (approximately)']

Practical Python Regex Patterns

Here are battle-tested patterns for common Python text-processing tasks:

import re

# 1. Email validation
email_re = re.compile(r'^[\w.%+-]+@[\w.-]+\.[a-zA-Z]{2,}$')
email_re.match('user@example.com')   # match
email_re.match('notanemail')          # None

# 2. Extract all URLs from text
url_re = re.compile(r'https?://[\w./%-]+')
url_re.findall("Visit https://example.com or http://test.org/path")
# → ['https://example.com', 'http://test.org/path']

# 3. Validate and parse ISO date
date_re = re.compile(r'^(?P<y>\d{4})-(?P<m>0[1-9]|1[0-2])-(?P<d>0[1-9]|[12]\d|3[01])$')
m = date_re.match("2026-03-06")
m.groupdict()  # {'y': '2026', 'm': '03', 'd': '06'}

# 4. Strip HTML tags
re.sub(r'<[^>]+>', '', '<p>Hello <b>world</b></p>')
# → 'Hello world'

# 5. Camel case to snake_case
def to_snake(name):
    s1 = re.sub(r'(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

to_snake('myVariableName')   # 'my_variable_name'
to_snake('HTMLParser')       # 'h_t_m_l_parser'

Common Mistakes

Using re.match() when you mean re.search(): The most common beginner error. Remember: match only checks the beginning of the string. Use search for "find anywhere" behavior.
Not using raw strings: '\d' is the letter 'd'; r'\d' is the regex digit pattern. Always use r'' for regex patterns.
Calling .group() without checking for None: If the pattern doesn't match, the function returns None and None.group() raises AttributeError. Use if m := re.search(...) (Python 3.8+ walrus operator) or check if m is not None.
Greedy vs. lazy quantifiers: .* is greedy and matches as much as possible. Use .*? (lazy) to match as little as possible — important for extracting content between delimiters.
Forgetting to escape dots in patterns: . in regex matches any character. To match a literal period, use \..

Generate Python regex patterns from plain English

Describe what you want to match — "extract email addresses", "validate ISO dates", "find all URLs" — and RegSQL generates the correct Python regex pattern with a full explanation of how it works.

✨ Try RegSQL Regex Generator Free →

Python Regex Quick Reference

re.match(p, s) — match at start of string
re.search(p, s) — find first match anywhere
re.fullmatch(p, s) — entire string must match
re.findall(p, s) — list of all matches
re.finditer(p, s) — iterator of Match objects
re.sub(p, r, s) — replace all matches
re.split(p, s) — split string by pattern
re.compile(p) — compile pattern for reuse

The Python re module covers the full range of text-processing needs. Start with re.search() and re.findall() for most tasks, add named groups for readability, and compile patterns when performance matters.

Python Regex: A Complete Guide to the re Module

Raw Strings: Always Use r'...'

re.match(), re.search(), re.fullmatch()

re.findall() — All Matches at Once

re.finditer() — Matches with Position Info