Regular Expressions: The Complete Developer's Guide

Q: What is the difference between greedy and lazy quantifiers in regex?

Greedy quantifiers (*, +, ?, {n,m}) match as much text as possible, then backtrack if the rest of the pattern fails. Lazy (non-greedy) quantifiers (*?, +?, ??, {n,m}?) match as little text as possible, then expand if needed. For example, given the input ' hello and world ', the greedy pattern .* matches ' hello and world ' (everything between the first and the last ), while the lazy pattern .*? matches ' hello ' and ' world ' separately. Use lazy quantifiers when you want the shortest possible match, and greedy when you want the longest.

Q: What causes catastrophic backtracking in regex and how do I prevent it?

Catastrophic backtracking occurs when a regex engine tries exponentially many paths through a pattern before determining that no match exists. It is caused by nested quantifiers on overlapping patterns, such as (a+)+ or (.*a){10}. On non-matching input, the engine backtracks through every possible combination of how characters can be distributed among the quantifiers. To prevent it: avoid nested quantifiers where the inner pattern can match the same characters as the outer one; use atomic groups (?>...) or possessive quantifiers (++, *+) when available; prefer specific character classes like [^<]* instead of .*; and test your patterns against long non-matching input to check for slowdowns.

February 11, 2026 · DevToolbox Team

Regular expressions are the universal language for pattern matching in text. Every mainstream programming language supports them. Every serious code editor relies on them. Every log file, data pipeline, and text processing workflow benefits from them. And yet, most developers treat regex as a dark art, copy-pasting patterns from Stack Overflow without truly understanding how they work.

This guide changes that. We start from absolute fundamentals and build all the way to advanced topics like lookahead assertions, catastrophic backtracking, and cross-language differences. Whether you are writing your first regex or debugging a complex pattern that runs 100 times slower than it should, this is the reference you need.

By the end of this guide, you will understand every major regex feature, know the syntax for six programming languages, have a library of common patterns to copy, and be able to write performant regular expressions with confidence.

⚙ Try it live: Test any pattern from this guide instantly with our Regex Tester, step through matches with the Regex Debugger, or browse ready-made patterns in the Regex Library.

What Are Regular Expressions?
Regex Syntax Basics
Quantifiers: Greedy, Lazy, and Possessive
Anchors and Boundaries
Groups and Capturing
Lookahead and Lookbehind
Character Classes and Shorthand
Flags and Modifiers
Common Regex Patterns
Regex in Different Languages
Performance Tips and Catastrophic Backtracking
Frequently Asked Questions

1. What Are Regular Expressions?

A regular expression (regex or regexp) is a sequence of characters that defines a search pattern. When you apply a regex to a string, the regex engine scans the string from left to right, attempting to find substrings that match the pattern you described.

At the simplest level, a regex can be a literal string. The pattern hello matches the exact text "hello" wherever it appears. But the real power of regex comes from metacharacters: special characters that represent classes of characters, repetitions, positions, and logical conditions.

Regular expressions were formalized in the 1950s by mathematician Stephen Kleene as part of formal language theory. Ken Thompson implemented them in the QED text editor in 1968, and they spread to grep, sed, awk, Perl, and eventually every modern language. Today, the most common flavors are PCRE (Perl-Compatible Regular Expressions), JavaScript regex, Python's re module, Java's java.util.regex, Go's regexp, and .NET regex.

Here is a simple example to ground the discussion:

# Pattern: \b\d{3}-\d{4}\b
# Matches: 7-digit phone numbers like 555-1234

# Input:  "Call 555-1234 or 800-555-9999 for info"
# Match:  555-1234 (the first match)

# \b     = word boundary
# \d{3}  = exactly 3 digits
# -      = literal hyphen
# \d{4}  = exactly 4 digits
# \b     = word boundary

This pattern uses word boundaries (\b), digit shorthand (\d), and quantifiers ({3}, {4}) to match a specific number format. Every piece of that pattern is covered in the sections that follow.

2. Regex Syntax Basics

Literal Characters

Most characters in a regex match themselves literally. The pattern cat matches the letter "c" followed by "a" followed by "t". Case matters by default: cat does not match "Cat" unless you enable case-insensitive mode.

Metacharacters

Twelve characters have special meaning in regex and must be escaped with a backslash to match literally:

Metacharacter	Meaning	To Match Literally
`.`	Any character except newline	`\.`
`*`	Zero or more of the preceding element	`\*`
`+`	One or more of the preceding element	`\+`
`?`	Zero or one of the preceding element	`\?`
`^`	Start of string (or line in multiline mode)	`\^`
`$`	End of string (or line in multiline mode)	`\$`
`{` `}`	Quantifier range	`\{` `\}`
`[` `]`	Character class	`\[` `\]`
`(` `)`	Grouping and capturing	`$` `$`
`\|`	Alternation (OR)	`\\|`
`\`	Escape character	`\\`

The Dot: Matching Any Character

The dot . is the most commonly used metacharacter. It matches any single character except a newline (unless the s / dotall flag is enabled). For example:

# Pattern: h.t
# Matches: "hat", "hot", "hit", "h t", "h3t"
# Does NOT match: "ht" (dot requires exactly one character)

# Pattern: .+
# Matches: any string of one or more characters (on a single line)

# Pattern: \.txt$
# Matches: filenames ending in ".txt" (dot is escaped to match literally)

Alternation: The OR Operator

The pipe | acts as a logical OR. It matches whatever is on the left side or the right side:

# Pattern: cat|dog
# Matches: "cat" or "dog"

# Pattern: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day
# Matches: any day of the week

# Be careful with scope:
# abc|def    matches "abc" OR "def" (the whole pattern, not just the characters)
# ab(c|d)ef  matches "abcef" OR "abdef" (alternation is scoped to the group)

Alternation has the lowest precedence among regex operators, so cat|dog food matches "cat" or "dog food", not "cat food" or "dog food". Use parentheses to control scope: (cat|dog) food.

⚙ Quick reference: Keep our Regex Cheat Sheet open while you practice. It covers all metacharacters, quantifiers, and flags on one page.

3. Quantifiers: Greedy, Lazy, and Possessive

Quantifiers control how many times a preceding element can repeat. They are fundamental to writing patterns that match variable-length text.

Basic Quantifiers

Quantifier	Meaning	Example	Matches
`*`	Zero or more	`ab*c`	"ac", "abc", "abbc", "abbbc"
`+`	One or more	`ab+c`	"abc", "abbc", "abbbc" (not "ac")
`?`	Zero or one	`colou?r`	"color", "colour"
`{n}`	Exactly n	`\d{4}`	"2026" (exactly 4 digits)
`{n,}`	n or more	`\d{2,}`	"42", "123", "9999" (2+ digits)
`{n,m}`	Between n and m	`\w{3,8}`	Words of 3 to 8 characters

Greedy vs. Lazy (Non-Greedy) Quantifiers

By default, all quantifiers are greedy: they match as much text as possible while still allowing the overall pattern to succeed. Adding a ? after a quantifier makes it lazy (or non-greedy): it matches as little text as possible.

# Input: <div>Hello</div><div>World</div>

# Greedy:  <div>.*</div>
# Matches: <div>Hello</div><div>World</div>  (the ENTIRE string)
# The .* eats everything, then backtracks just enough for </div> to match

# Lazy:    <div>.*?</div>
# Matches: <div>Hello</div>  (first match)
#          <div>World</div>  (second match)
# The .*? matches as little as possible

# The lazy versions of each quantifier:
# *?   = zero or more (lazy)
# +?   = one or more (lazy)
# ??   = zero or one (lazy)
# {n,m}? = between n and m (lazy, matches closer to n)

A common mistake is thinking lazy quantifiers are always better. They are not. Lazy quantifiers can actually be slower in some cases because they attempt to match at every position and expand incrementally. The best approach is usually to use a negated character class instead:

# Instead of lazy .*? between quotes:
".*?"         # Works but can be slow on long inputs

# Use a negated character class (faster and more precise):
"[^"]*"       # Matches everything that is NOT a quote between quotes

# Instead of lazy .*? between HTML tags:
<div>.*?</div>     # Works
<div>[^<]*</div>   # Faster (matches non-< characters)

Possessive Quantifiers

Possessive quantifiers (supported in Java, PCRE, and some other engines, but NOT in JavaScript or Python) match as much as possible and never backtrack. They are written by adding + after a quantifier:

# Possessive quantifiers (Java/PCRE):
# *+   = zero or more (possessive)
# ++   = one or more (possessive)
# ?+   = zero or one (possessive)
# {n,m}+ = between n and m (possessive)

# Example:
# Pattern: \d++\b
# On input "12345abc", \d++ matches "12345" and won't give back any digits
# If \b fails, the entire match fails immediately (no backtracking)

# This prevents catastrophic backtracking in patterns like:
# (a++)+ on "aaaaaab" — fails quickly instead of trying 2^n paths

4. Anchors and Boundaries

Anchors do not match characters. They match positions in the string. They are zero-width assertions that constrain where a pattern can match.

Start and End Anchors

# ^  = start of string (or start of line with /m flag)
# $  = end of string (or end of line with /m flag)

# Pattern: ^Hello
# Matches "Hello world" (starts with "Hello")
# Does NOT match "Say Hello" (not at start)

# Pattern: world$
# Matches "Hello world" (ends with "world")
# Does NOT match "world peace" (not at end)

# Pattern: ^\d+$
# Matches strings that are ENTIRELY digits: "12345", "0", "999"
# Does NOT match "abc123" or "123abc" or "12 34"

# Combined with multiline flag (/m):
# ^  matches start of each LINE
# $  matches end of each LINE
# Without /m, they match only the start/end of the ENTIRE string

Word Boundaries

# \b  = word boundary (between \w and \W, or at start/end of string)
# \B  = non-word boundary (opposite of \b)

# Pattern: \bcat\b
# Matches: "the cat sat" (the word "cat")
# Does NOT match: "concatenate" or "catalog" (\b prevents partial matches)

# Pattern: \Bcat\B
# Matches: "concatenate" (cat surrounded by word characters)
# Does NOT match: "cat" or "cat " (requires non-boundaries on both sides)

# Pattern: \b\w+\b
# Matches every whole word in the input

# Practical use: replace whole words only
# In JavaScript:
"He is heroic".replace(/\bhe\b/gi, "she");
// "she is heroic" (does NOT change "heroic" to "sheroic")

Other Anchors

# \A  = absolute start of string (never affected by /m flag)
# \Z  = end of string or before final newline
# \z  = absolute end of string (never affected by /m flag)
# Note: \A, \Z, \z are supported in Python, Java, PCRE, Ruby, .NET
# JavaScript does NOT support \A, \Z, \z

# These are useful when you need ^ and $ anchors but also have /m enabled:
# In multiline mode, ^ and $ match line boundaries
# But \A and \z always match the absolute string boundaries

5. Groups and Capturing

Parentheses serve two purposes in regex: they group elements together (for quantifiers and alternation) and they capture the matched text for later reference.

Numbered Capture Groups

# Every pair of parentheses creates a numbered group, starting from 1.
# The numbering follows the order of opening parentheses, left to right.

# Pattern: (\d{4})-(\d{2})-(\d{2})
# Input:   "2026-02-11"
# Group 1: "2026" (year)
# Group 2: "02"   (month)
# Group 3: "11"   (day)

# Nested groups:
# Pattern: ((https?)://(\w+\.com))
# Input:   "https://example.com"
# Group 1: "https://example.com" (entire URL)
# Group 2: "https"               (protocol)
# Group 3: "example.com"         (domain)

# Use in replacements:
# JavaScript: "2026-02-11".replace(/(\d{4})-(\d{2})-(\d{2})/, "$2/$3/$1")
# Result:     "02/11/2026"

# Python: re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', "2026-02-11")
# Result: "02/11/2026"

Named Capture Groups

Named groups make complex patterns more readable by assigning meaningful names to captures. The syntax varies slightly between languages:

# JavaScript/PCRE: (?<name>pattern) or (?P<name>pattern)
# Python:         (?P<name>pattern)
# Java:           (?<name>pattern)
# .NET:           (?<name>pattern) or (?'name'pattern)

# Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})

# JavaScript:
"2026-02-11".replace(
    /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/,
    "$<month>/$<day>/$<year>"
);
// "02/11/2026"

# JavaScript (accessing groups in code):
const match = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/.exec("2026-02-11");
console.log(match.groups.year);   // "2026"
console.log(match.groups.month);  // "02"

# Python:
import re
m = re.match(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})', "2026-02-11")
m.group('year')   # "2026"
m.group('month')  # "02"

Non-Capturing Groups

# (?:pattern) groups elements without creating a capture.
# Use when you need grouping for quantifiers or alternation but don't need the match.

# Without non-capturing group (wastes a group number):
# Pattern: (https?)://(\w+\.com)
# Group 1: "https" (we might not need this)
# Group 2: "example.com"

# With non-capturing group:
# Pattern: (?:https?)://(\w+\.com)
# Group 1: "example.com" (cleaner numbering)

# Common use: optional protocol prefix
# Pattern: (?:https?://)?(\w+\.com)
# Matches: "https://example.com" -> Group 1: "example.com"
# Matches: "example.com" -> Group 1: "example.com"

# Another use: alternation with quantifier
# Pattern: (?:ab|cd){3}
# Matches: "ababab", "cdcdcd", "abcdab", etc.

Backreferences

Backreferences let you match the same text that was captured by an earlier group. This is powerful for finding repeated patterns:

# \1 refers to whatever Group 1 matched (not the pattern, the actual text)

# Pattern: (\w+)\s+\1
# Matches: "the the" (repeated word)
# Matches: "hello hello" (repeated word)
# Does NOT match: "the them" (\1 must match "the" exactly)

# Find duplicate lines:
# Pattern: ^(.+)$\n\1$  (with /m flag)

# Match opening and closing HTML tags:
# Pattern: <(\w+)>.*?</\1>
# Matches: <div>content</div>
# Matches: <span>text</span>
# Does NOT match: <div>content</span> (mismatched tags)

# Named backreferences:
# JavaScript: \k<name>
# Python:     (?P=name)
# Java:       \k<name>
# PCRE:       \k<name> or (?P=name)

# Pattern: (?<tag>\w+)>.*?</\k<tag>>
# Same as above but with named groups

⚙ Practice groups: Our Regex Debugger highlights each capture group in a different color and shows exactly what each group matched. Try the examples above.

6. Lookahead and Lookbehind

Lookaround assertions are zero-width patterns that match a position based on what comes before or after it, without including that context in the match. They are one of the most powerful features in modern regex.

Positive Lookahead: (?=...)

Matches a position that is followed by the specified pattern:

# Pattern: \d+(?= dollars)
# Input:   "I have 100 dollars and 50 euros"
# Matches: "100" (followed by " dollars")
# Does NOT match: "50" (followed by " euros")
# Note: " dollars" is NOT part of the match, only "100"

# Practical: match a word only when followed by a specific context
# Pattern: \w+(?=\()
# Input:   "print(x) and y = 5"
# Matches: "print" (followed by opening parenthesis)

# Insert commas into numbers:
"1234567".replace(/\B(?=(\d{3})+$)/g, ",");
// "1,234,567"

Negative Lookahead: (?!...)

Matches a position that is NOT followed by the specified pattern:

# Pattern: \d{3}(?!-)
# Input:   "555-1234 and 999"
# Matches: "123" within "1234", and "999"
# Does NOT match: "555" (followed by "-")

# Match "foo" only when NOT followed by "bar":
# Pattern: foo(?!bar)
# Input:   "foobar foobaz foo"
# Matches: "foo" in "foobaz" and standalone "foo"

# Practical: match .js files but not .json:
# Pattern: \.js(?!on)\b
# Matches: "app.js", "utils.js"
# Does NOT match: "config.json"

# Password validation (must NOT contain spaces):
# Pattern: ^(?!.*\s).{8,}$
# Matches strings of 8+ characters with no spaces

Positive Lookbehind: (?<=...)

Matches a position that is preceded by the specified pattern. Supported in JavaScript (ES2018+), Python, Java, .NET, and PCRE:

# Pattern: (?<=\$)\d+
# Input:   "Price $100, code 200"
# Matches: "100" (preceded by "$")
# Does NOT match: "200" (not preceded by "$")

# Pattern: (?<=@)\w+
# Input:   "user@example.com"
# Matches: "example" (preceded by "@")

# Extract values after specific labels:
# Pattern: (?<=version:\s)\d+\.\d+\.\d+
# Input:   "version: 2.5.3"
# Matches: "2.5.3"

# Note on variable-length lookbehind:
# JavaScript, Java, and PCRE require FIXED-LENGTH lookbehind
# Python and .NET allow VARIABLE-LENGTH lookbehind
# (?<=ab|cde) works everywhere (alternation of fixed lengths)
# (?<=\w+)    works in Python/.NET only

Negative Lookbehind: (?<!...)

# Pattern: (?<!\$)\d+
# Input:   "Price $100, code 200"
# Matches: "00" within "$100" (the digits not preceded by $), and "200"
# To match whole numbers not preceded by $:
# Pattern: (?<!\$)\b\d+\b
# Matches: "200" only

# Match "port" but not in "airport" or "report":
# Pattern: (?<!air|re)port\b
# Input:   "airport report port export"
# Matches: "port" (standalone) and "port" in "export"

# Practical: replace foo that's NOT inside a comment:
# Pattern: (?<!//\s*)foo
# This is a simplified approach; real comment detection requires parsing

Combining Lookaround Assertions

# You can combine multiple lookaheads and lookbehinds:

# Password validation: 8+ chars, at least one uppercase, one lowercase, one digit
# Pattern: ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$
# (?=.*[A-Z])  — at least one uppercase letter anywhere
# (?=.*[a-z])  — at least one lowercase letter anywhere
# (?=.*\d)     — at least one digit anywhere
# .{8,}        — at least 8 characters total

# Match a number surrounded by word characters but not digits:
# Pattern: (?<=[a-zA-Z])\d+(?=[a-zA-Z])
# Input:   "abc123def 456"
# Matches: "123" (surrounded by letters)

# Match text between delimiters without including delimiters:
# Pattern: (?<=\[)[^\]]+(?=\])
# Input:   "array[index] and map[key]"
# Matches: "index", "key"

7. Character Classes and Shorthand

Character classes let you define a set of characters, any one of which can match at a given position. They are enclosed in square brackets.

Basic Character Classes

# [abc]      — matches "a", "b", or "c"
# [a-z]      — matches any lowercase letter
# [A-Z]      — matches any uppercase letter
# [0-9]      — matches any digit
# [a-zA-Z]   — matches any letter
# [a-zA-Z0-9] — matches any alphanumeric character

# Ranges can be combined:
# [a-z0-9_]  — lowercase letters, digits, and underscore
# [a-fA-F0-9] — hexadecimal characters

# Special characters INSIDE brackets mostly lose their special meaning:
# [.+*?]     — matches a literal dot, plus, star, or question mark
# BUT these are still special inside brackets:
# ]  — closes the class (escape as \] or put first: []abc])
# \  — escape character
# ^  — negation when first (escape as \^ or put not-first: [a^b])
# -  — range when between characters (escape as \- or put first/last: [-abc] or [abc-])

Negated Character Classes

# [^abc]     — matches any character EXCEPT "a", "b", or "c"
# [^0-9]     — matches any non-digit character
# [^a-zA-Z]  — matches any non-letter character
# [^\s]      — matches any non-whitespace character

# Common use: match everything between delimiters
# "[^"]*"    — matches a double-quoted string (no quotes inside)
# '[^']*'    — matches a single-quoted string
# <[^>]+>   — matches an HTML tag

# Negated classes are almost always faster than lazy quantifiers:
# "[^"]*" is faster than ".*?" because it does not backtrack

Shorthand Character Classes

Shorthand	Equivalent	Meaning
`\d`	`[0-9]`	Any digit
`\D`	`[^0-9]`	Any non-digit
`\w`	`[a-zA-Z0-9_]`	Any word character
`\W`	`[^a-zA-Z0-9_]`	Any non-word character
`\s`	`[ \t\n\r\f\v]`	Any whitespace character
`\S`	`[^ \t\n\r\f\v]`	Any non-whitespace character

# Note: In Unicode-aware mode (JavaScript /u, Python re.UNICODE):
# \d also matches non-ASCII digits (e.g., Arabic-Indic digits)
# \w also matches Unicode letters and digits
# \s also matches Unicode whitespace

# If you need strictly ASCII behavior, use explicit ranges:
# [0-9] instead of \d
# [a-zA-Z0-9_] instead of \w

# Combining shorthand in classes:
# [\d\s]     — matches a digit OR whitespace
# [\w-]      — matches a word character OR hyphen (useful for identifiers)
# [^\d\s]    — matches anything that is NOT a digit AND NOT whitespace

POSIX Character Classes

# POSIX classes are used in tools like grep, sed, and awk (NOT JavaScript):
# [:alpha:]  — alphabetic characters [a-zA-Z]
# [:digit:]  — digits [0-9]
# [:alnum:]  — alphanumeric [a-zA-Z0-9]
# [:space:]  — whitespace characters
# [:upper:]  — uppercase letters [A-Z]
# [:lower:]  — lowercase letters [a-z]
# [:punct:]  — punctuation characters
# [:xdigit:] — hexadecimal digits [0-9a-fA-F]

# They must be used INSIDE a bracket expression:
# grep '[[:digit:]]' file.txt   (correct)
# grep '[:digit:]' file.txt     (WRONG — matches ":", "d", "i", "g", "t")

8. Flags and Modifiers

Flags (also called modifiers) change how the regex engine interprets your pattern. They are specified differently in each language, but the concepts are universal.

Flag	Name	Effect	Supported In
`g`	Global	Match all occurrences, not just the first	JavaScript, sed, Perl
`i`	Case-insensitive	`A` matches `a` and vice versa	All engines
`m`	Multiline	`^` and `$` match line boundaries instead of string boundaries	All engines
`s`	Dotall / Single-line	`.` matches newline characters too	JS (ES2018+), Python, Java, PCRE
`u`	Unicode	Enable full Unicode matching, treat pattern as Unicode code points	JavaScript (ES6+)
`y`	Sticky	Match only at `lastIndex` position (no global searching)	JavaScript (ES6+)
`x`	Extended / Verbose	Ignore whitespace and allow comments in pattern	Python, Perl, PCRE, Ruby, Java

# JavaScript flags (inline):
/pattern/g        // global
/pattern/i        // case-insensitive
/pattern/m        // multiline
/pattern/s        // dotall
/pattern/u        // unicode
/pattern/y        // sticky
/pattern/gims     // combined

# Python flags:
re.IGNORECASE  (re.I)
re.MULTILINE   (re.M)
re.DOTALL      (re.S)
re.VERBOSE     (re.X)
re.UNICODE     (re.U)
re.search(r'pattern', text, re.IGNORECASE | re.MULTILINE)

# Python verbose mode example — readable complex patterns:
pattern = re.compile(r'''
    (?P<year>\d{4})    # Year (4 digits)
    -                   # Separator
    (?P<month>\d{2})   # Month (2 digits)
    -                   # Separator
    (?P<day>\d{2})     # Day (2 digits)
''', re.VERBOSE)

# Inline flags (supported in PCRE, Python, Java):
# (?i)pattern     — case-insensitive for this pattern
# (?m)pattern     — multiline mode
# (?s)pattern     — dotall mode
# (?x)pattern     — verbose mode
# (?i:abc)        — case-insensitive only within this group

⚙ Test with flags: Our Regex Tester lets you toggle flags with checkboxes and see how they change matching behavior instantly.

9. Common Regex Patterns

These are battle-tested patterns for the most frequently needed validations and extractions. Each pattern includes notes on edge cases and limitations.

Email Address

# Simple (covers 99% of real-world emails):
[\w.+-]+@[\w-]+\.[\w.-]+

# More strict (RFC 5322 subset):
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# Examples that match:
# user@example.com
# first.last+tag@sub.domain.co.uk
# user123@company.org

# Note: A fully RFC-compliant email regex is thousands of characters long.
# In practice, the simple version above is sufficient. For production
# validation, send a verification email instead of relying on regex.

URL

# HTTP/HTTPS URLs:
https?://[^\s<>"']+

# More complete (with optional port, path, query, fragment):
https?://[\w-]+(\.[\w-]+)+(:\d+)?(/[\w./-]*)?(\?\S+)?(#\S+)?

# Examples that match:
# https://example.com
# http://sub.domain.com:8080/path/to/page?q=search#section
# https://api.example.com/v2/users

# Match URLs with any protocol:
[a-zA-Z][a-zA-Z0-9+.-]*://[^\s<>"']+

IP Address (IPv4)

# Basic (allows invalid octets like 999):
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

# Strict (validates each octet is 0-255):
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# The strict version breaks down:
# 25[0-5]     — matches 250-255
# 2[0-4]\d    — matches 200-249
# [01]?\d\d?  — matches 0-199
# \.          — literal dot between octets
# {3}         — three octets followed by dot
# Then one final octet without trailing dot

# Examples that match: 192.168.1.1, 10.0.0.0, 255.255.255.255
# The strict version rejects: 999.999.999.999, 256.1.1.1

Phone Number (US Format)

# Flexible (handles many common formats):
\+?1?[-.\s]?\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})

# Matches:
# (555) 123-4567
# 555-123-4567
# 555.123.4567
# +1 555 123 4567
# 15551234567

# International (E.164 format):
\+\d{1,3}\d{4,14}

# Note: Phone validation is complex because formats vary by country.
# For production, use a library like Google's libphonenumber.

Date (Various Formats)

# YYYY-MM-DD (ISO 8601):
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

# MM/DD/YYYY:
(?:0[1-9]|1[0-2])/(?:0[1-9]|[12]\d|3[01])/\d{4}

# DD.MM.YYYY (European):
(?:0[1-9]|[12]\d|3[01])\.(?:0[1-9]|1[0-2])\.\d{4}

# Flexible (matches most common separators):
\d{1,4}[-/.]\d{1,2}[-/.]\d{1,4}

# Note: These patterns validate format but not logic.
# They will match "2026-02-31" (February 31st doesn't exist).
# Use date parsing libraries for true validation.

Password Strength

# Minimum 8 characters, at least one uppercase, lowercase, and digit:
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$

# Add requirement for special character:
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*()_+=-]).{8,}$

# Require 12+ characters, no spaces:
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?!.*\s).{12,}$

# Breakdown:
# ^              — start of string
# (?=.*[A-Z])    — lookahead: at least one uppercase letter
# (?=.*[a-z])    — lookahead: at least one lowercase letter
# (?=.*\d)       — lookahead: at least one digit
# (?!.*\s)       — negative lookahead: no whitespace
# .{8,}          — at least 8 characters
# $              — end of string

Hex Color Code

# 3 or 6 digit hex color:
#([0-9a-fA-F]{6}|[0-9a-fA-F]{3})\b

# 8-digit hex with alpha:
#[0-9a-fA-F]{8}\b

# Any hex color (3, 4, 6, or 8 digits):
#[0-9a-fA-F]{3,8}\b

# Matches: #fff, #FF0000, #3b82f6, #3b82f680

HTML Tag

# Any HTML tag (opening or closing):
</?[\w-]+(\s+[\w-]+(=("[^"]*"|'[^']*'|[\w-]+))?)*\s*/?>

# Self-closing tags:
<[\w-]+(\s+[\w-]+(="[^"]*")?)*\s*/>

# Warning: Do NOT use regex to parse HTML in production.
# HTML is not a regular language and regex cannot handle:
# - Nested tags of the same type
# - CDATA sections
# - Conditional comments
# - Malformed HTML
# Use a DOM parser instead. Regex is fine for quick text processing tasks.

⚙ Browse more patterns: Our Regex Library has dozens of ready-to-use patterns for emails, URLs, credit cards, and more. Copy them directly into your code.

10. Regex in Different Languages

Every major programming language supports regular expressions, but the APIs and some syntax details differ. Here is how to use regex in six popular languages.

JavaScript

// Creating regex:
const re1 = /pattern/flags;              // Regex literal
const re2 = new RegExp('pattern', 'flags'); // Constructor (dynamic patterns)

// Testing for a match:
/\d+/.test("abc 123");                   // true
"abc 123".match(/\d+/);                  // ["123"]
"abc 123 def 456".matchAll(/\d+/g);      // Iterator of all matches

// Replacing:
"hello".replace(/l/g, "r");              // "herro"
"hello".replaceAll(/l/g, "r");           // "herro" (ES2021+)

// Splitting:
"one,two,,four".split(/,+/);             // ["one", "two", "four"]

// Extracting groups:
const m = /(\d{4})-(\d{2})/.exec("2026-02");
m[1]; // "2026"
m[2]; // "02"

// Named groups (ES2018+):
const m2 = /(?<year>\d{4})-(?<month>\d{2})/.exec("2026-02");
m2.groups.year;  // "2026"
m2.groups.month; // "02"

// Sticky flag (match at exact position):
const re = /\d+/y;
re.lastIndex = 4;
re.exec("abc 123"); // ["123"] (matches at index 4)

Python

import re

# Basic matching:
re.search(r'\d+', 'abc 123')          # Match object (first match)
re.findall(r'\d+', 'abc 123 def 456') # ['123', '456'] (all matches)
re.finditer(r'\d+', 'abc 123')        # Iterator of match objects

# Replacing:
re.sub(r'\d+', 'NUM', 'abc 123 def 456')   # 'abc NUM def NUM'
re.sub(r'(\w+)', r'\1!', 'hello world')     # 'hello! world!'
re.subn(r'\d+', 'X', 'a1 b2 c3')            # ('aX bX cX', 3)

# Splitting:
re.split(r'[,;\s]+', 'one, two; three four') # ['one', 'two', 'three', 'four']

# Compiled patterns (recommended for repeated use):
pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = pattern.search('Date: 2026-02-11')
m.group('year')   # '2026'
m.group('month')  # '02'
m.group('day')    # '11'
m.group(0)        # '2026-02-11' (entire match)

# Flags:
re.search(r'hello', 'Hello World', re.IGNORECASE)  # Matches
re.findall(r'^\w+', text, re.MULTILINE)             # Match at each line start

# Verbose mode for readable patterns:
email_re = re.compile(r'''
    [\w.+-]+       # Username
    @              # @ symbol
    [\w-]+         # Domain name
    \.             # Dot
    [\w.-]+        # TLD
''', re.VERBOSE)

# fullmatch (Python 3.4+) — must match the ENTIRE string:
re.fullmatch(r'\d{4}', '2026')   # Match
re.fullmatch(r'\d{4}', '2026!')  # None

Go

package main

import (
    "fmt"
    "regexp"
)

func main() {
    // Compile pattern (panics on invalid regex):
    re := regexp.MustCompile(`\d+`)

    // Test for match:
    fmt.Println(re.MatchString("abc 123")) // true

    // Find first match:
    fmt.Println(re.FindString("abc 123 def 456")) // "123"

    // Find all matches:
    fmt.Println(re.FindAllString("abc 123 def 456", -1)) // ["123", "456"]

    // Replace:
    fmt.Println(re.ReplaceAllString("abc 123", "NUM")) // "abc NUM"

    // Replace with function:
    result := re.ReplaceAllStringFunc("abc 123 def 456", func(s string) string {
        return "[" + s + "]"
    })
    fmt.Println(result) // "abc [123] def [456]"

    // Submatch (capture groups):
    dateRe := regexp.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
    match := dateRe.FindStringSubmatch("Date: 2026-02-11")
    // match[0] = "2026-02-11"
    // match[1] = "2026"
    // match[2] = "02"
    // match[3] = "11"

    // Named groups:
    namedRe := regexp.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})`)
    m := namedRe.FindStringSubmatch("2026-02")
    for i, name := range namedRe.SubexpNames() {
        if name != "" {
            fmt.Printf("%s: %s\n", name, m[i])
        }
    }

    // IMPORTANT: Go uses RE2 syntax, which does NOT support:
    // - Lookahead (?=), (?!)
    // - Lookbehind (?<=), (?<!)
    // - Backreferences \1
    // - Possessive quantifiers
    // This is by design — RE2 guarantees linear-time matching.
}

Java

import java.util.regex.*;

public class RegexExample {
    public static void main(String[] args) {
        // Compile pattern:
        Pattern pattern = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
        Matcher matcher = pattern.matcher("Date: 2026-02-11");

        // Find and extract groups:
        if (matcher.find()) {
            System.out.println(matcher.group(0)); // "2026-02-11"
            System.out.println(matcher.group(1)); // "2026"
            System.out.println(matcher.group(2)); // "02"
            System.out.println(matcher.group(3)); // "11"
        }

        // Named groups:
        Pattern named = Pattern.compile("(?<year>\\d{4})-(?<month>\\d{2})");
        Matcher m = named.matcher("2026-02");
        if (m.find()) {
            System.out.println(m.group("year"));  // "2026"
            System.out.println(m.group("month")); // "02"
        }

        // Replace:
        String result = "abc 123 def 456".replaceAll("\\d+", "NUM");
        // "abc NUM def NUM"

        // Replace first only:
        String first = "abc 123 def 456".replaceFirst("\\d+", "NUM");
        // "abc NUM def 456"

        // Split:
        String[] parts = "one, two, three".split(",\\s*");
        // ["one", "two", "three"]

        // Flags:
        Pattern ci = Pattern.compile("hello", Pattern.CASE_INSENSITIVE);
        Pattern ml = Pattern.compile("^\\w+", Pattern.MULTILINE);
        Pattern combined = Pattern.compile("pattern",
            Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

        // Java supports: lookahead, lookbehind, atomic groups,
        // possessive quantifiers, Unicode categories, and more.
        // Backslashes must be doubled in Java strings: \d becomes \\d
    }
}

PHP

<?php
// PHP uses PCRE (Perl-Compatible Regular Expressions)
// Patterns must be enclosed in delimiters (usually /):

// Match:
preg_match('/(\d{4})-(\d{2})-(\d{2})/', 'Date: 2026-02-11', $matches);
// $matches[0] = "2026-02-11"
// $matches[1] = "2026"
// $matches[2] = "02"
// $matches[3] = "11"

// Match all:
preg_match_all('/\d+/', 'abc 123 def 456', $matches);
// $matches[0] = ["123", "456"]

// Replace:
$result = preg_replace('/\d+/', 'NUM', 'abc 123 def 456');
// "abc NUM def NUM"

// Replace with callback:
$result = preg_replace_callback('/\d+/', function($m) {
    return intval($m[0]) * 2;
}, 'price: 50, tax: 8');
// "price: 100, tax: 16"

// Named groups:
preg_match('/(?P<year>\d{4})-(?P<month>\d{2})/', '2026-02', $m);
echo $m['year'];  // "2026"
echo $m['month']; // "02"

// Split:
$parts = preg_split('/[,;\s]+/', 'one, two; three');
// ["one", "two", "three"]

// Flags (after closing delimiter):
preg_match('/hello/i', 'Hello World', $m); // case-insensitive
preg_match('/^word/m', $text, $m);         // multiline
preg_match('/start.*end/s', $text, $m);    // dotall

// PHP PCRE supports everything: lookaround, atomic groups,
// possessive quantifiers, Unicode properties (\p{L}), recursion (?R)
?>

Ruby

# Ruby has first-class regex support with the =~ operator:

# Match:
"abc 123" =~ /\d+/      # 4 (index of match)
$&                       # "123" (matched text)

# Match with named groups:
if /(?<year>\d{4})-(?<month>\d{2})/ =~ "2026-02"
  puts year   # "2026" (local variables created automatically!)
  puts month  # "02"
end

# MatchData object:
m = /(\d{4})-(\d{2})-(\d{2})/.match("2026-02-11")
m[0]  # "2026-02-11"
m[1]  # "2026"
m[2]  # "02"

# Replace:
"abc 123".gsub(/\d+/, "NUM")        # "abc NUM"
"abc 123".sub(/\d+/, "NUM")         # "abc NUM" (first only)

# Replace with block:
"abc 123".gsub(/\d+/) { |m| m.to_i * 2 }  # "abc 246"

# Replace with hash:
"cat and dog".gsub(/cat|dog/, "cat" => "feline", "dog" => "canine")
# "feline and canine"

# Scan (find all matches):
"abc 123 def 456".scan(/\d+/)       # ["123", "456"]

# Split:
"one, two, three".split(/,\s*/)     # ["one", "two", "three"]

# Flags:
/pattern/i    # case-insensitive
/pattern/m    # multiline (. matches newline — like /s in other engines!)
/pattern/x    # extended (verbose mode with comments)

# Note: Ruby's /m flag is equivalent to /s (dotall) in other engines.
# Ruby's ^ and $ ALWAYS match line boundaries (multiline by default).

11. Performance Tips and Catastrophic Backtracking

Most regex patterns run in milliseconds. But poorly written patterns can take minutes, hours, or effectively forever on certain inputs. Understanding why this happens and how to prevent it is essential for writing production-quality regex.

How Regex Engines Work

Most regex engines use NFA (Nondeterministic Finite Automaton) backtracking. When the engine encounters a choice point (alternation, quantifier), it tries one option. If the rest of the pattern fails, it backtracks to the choice point and tries the next option. This process is usually fast, but certain patterns create an exponential number of paths to try.

The exception is Go's regexp package, which uses RE2, a linear-time engine that never backtracks. This is why Go's regex does not support lookaround or backreferences: those features require backtracking.

Catastrophic Backtracking

# The classic example:
# Pattern: (a+)+b
# Input:   "aaaaaaaaaaaaaaaaac" (16 a's followed by c)

# The engine tries every possible way to distribute the a's among
# the inner a+ and the outer ()+ before determining no match exists.
# With 16 a's, that is 2^16 = 65,536 combinations.
# With 32 a's, that is 2^32 = 4,294,967,296 combinations.

# More realistic dangerous patterns:
(\w+\s*)+;              # Parsing a line ending with semicolon
(.*?,){11}P             # Matching CSV with 12 fields
(\d+\.?\d*|\.\d+)+      # Matching numbers (overlapping alternatives)

# How to recognize the problem:
# 1. Nested quantifiers: (a+)+, (a*)+, (a+)*
# 2. Overlapping alternatives: (\d+|\d+\.\d+)
# 3. Open-ended patterns with backtracking: .* followed by a specific token

# The pattern performs fine on MATCHING input.
# The problem is NON-MATCHING input, where the engine exhausts all paths.

Fixes for Catastrophic Backtracking

# Fix 1: Eliminate nested quantifiers
# Bad:    (a+)+b
# Good:   a+b

# Fix 2: Use atomic groups (PCRE, Java, .NET, Ruby)
# Atomic groups never backtrack once matched:
# Bad:    (a+)+b
# Good:   (?>a+)+b   (atomic group — no backtracking into the group)

# Fix 3: Use possessive quantifiers (PCRE, Java)
# Bad:    \d+\.\d+
# Good:   \d++\.\d++   (possessive — won't give back matched digits)

# Fix 4: Use specific character classes instead of .
# Bad:    ".*"  (dot matches everything, has to backtrack a lot)
# Good:   "[^"]*"  (negated class, no backtracking needed)

# Fix 5: Make alternatives non-overlapping
# Bad:    (\d+|\d+\.\d+)   (both start with \d+)
# Good:   (\d+\.?\d*)      (single branch handles both)

# Fix 6: Anchor your patterns
# Bad:    \d{4}-\d{2}-\d{2}   (tries to match at every position)
# Good:   ^\d{4}-\d{2}-\d{2}$ (fails immediately at wrong positions)

Performance Best Practices

# 1. Compile and reuse patterns
# Python:
pattern = re.compile(r'\d+')  # Compile once
for line in millions_of_lines:
    pattern.findall(line)      # Reuse compiled pattern

# JavaScript:
const re = /\d+/g;
// Reset lastIndex before reusing with /g flag
re.lastIndex = 0;

# 2. Use non-capturing groups when you don't need captures
# Slower:  (https?://)(www\.)?(\w+\.com)   (3 capture groups)
# Faster:  (?:https?://)(?:www\.)?(\w+\.com)   (1 capture group)

# 3. Order alternations by frequency
# If "http" appears much more often than "ftp":
# Better:  (?:http|ftp)://
# Worse:   (?:ftp|http)://

# 4. Fail fast — put the most distinctive part first
# Bad:    .*specific_string   (scans entire string before testing)
# Better: specific_string     (the engine can use optimizations)

# 5. Avoid .* at the start of a pattern
# Bad:    .*@example\.com$   (matches the whole string, then backtracks)
# Good:   \S+@example\.com$  (more constrained, less backtracking)

# 6. Use string methods for simple operations
# For literal substring search, don't use regex:
# Python: "foo" in text    (much faster than re.search(r'foo', text))
# JS:     text.includes("foo")  (faster than /foo/.test(text))

# 7. For large files, use streaming tools
# sed and awk process line by line, never loading the entire file.
# In Python, iterate over file lines instead of reading all at once:
with open('huge.log') as f:
    for line in f:
        if pattern.search(line):
            process(line)

Testing for Performance Issues

# Create adversarial test inputs to check for backtracking:

# For pattern (a+)+b, test with:
# "aaaaaaaaaaaaaaaa" (no 'b' at end — triggers backtracking)
# Increase the number of a's and measure execution time.
# If time grows exponentially, you have catastrophic backtracking.

# Python timing test:
import re, time

pattern = re.compile(r'(a+)+b')
for n in [15, 20, 25, 30]:
    text = 'a' * n + 'c'
    start = time.time()
    pattern.search(text)
    elapsed = time.time() - start
    print(f"n={n}: {elapsed:.4f}s")

# If you see:
# n=15: 0.001s
# n=20: 0.05s
# n=25: 1.5s
# n=30: 45s
# That is exponential growth — the pattern has catastrophic backtracking.

# JavaScript (set a timeout to avoid hanging the browser):
const start = performance.now();
/(a+)+b/.test('a'.repeat(25) + 'c');
console.log(`${performance.now() - start}ms`);

⚙ Debug performance: Use our Regex Debugger to visualize how many steps the engine takes to match your pattern. High step counts on non-matching input indicate potential backtracking problems.

Frequently Asked Questions

What is the difference between greedy and lazy quantifiers in regex?

Greedy quantifiers (*, +, ?, {n,m}) match as much text as possible, then backtrack if the rest of the pattern fails. Lazy (non-greedy) quantifiers (*?, +?, ??, {n,m}?) match as little text as possible, then expand if needed. For example, given the input hello and world, the greedy pattern .* matches the entire string from the first  to the last , while the lazy pattern .*? matches hello and world separately. In many cases, a negated character class like [^<]* is both more correct and faster than either approach.

How do lookahead and lookbehind assertions work in regular expressions?

Lookahead and lookbehind are zero-width assertions that match a position in the string without consuming characters. Positive lookahead (?=pattern) matches a position followed by the specified pattern. Negative lookahead (?!pattern) matches a position NOT followed by the pattern. Positive lookbehind (?<=pattern) matches a position preceded by the pattern. Negative lookbehind (?<!pattern) matches a position NOT preceded by the pattern. For example, \d+(?= dollars) matches numbers followed by " dollars" without including " dollars" in the match. They are supported in JavaScript (ES2018+), Python, Java, .NET, PCRE (PHP, Ruby via Oniguruma), and most modern regex engines. Go's RE2 engine does not support them.

What causes catastrophic backtracking in regex and how do I prevent it?

Catastrophic backtracking occurs when a regex engine tries exponentially many paths through a pattern before determining that no match exists. It is caused by nested quantifiers on overlapping patterns, such as (a+)+ or (.*a){10}. On non-matching input, the engine backtracks through every possible combination of how characters can be distributed among the quantifiers. To prevent it: avoid nested quantifiers where the inner pattern can match the same characters as the outer one; use atomic groups (?>...) or possessive quantifiers (++, *+) when available; prefer specific character classes like [^<]* instead of .*; make alternation branches non-overlapping; and test your patterns against long non-matching input to check for exponential slowdowns.

Conclusion

Regular expressions are one of the few skills that remain relevant across every programming language, every editor, and every decade of software development. They were useful in the 1970s, they are useful today, and they will still be useful in 2036. The syntax is stable. The concepts transfer across tools. And the ability to describe patterns in text is a fundamental computing capability that no AI or framework replaces.

If you are just starting with regex, begin with literals and character classes. Add quantifiers and anchors once those feel natural. Move to groups and backreferences when you need to capture and rearrange text. Save lookaround assertions for when you genuinely need them. And always, always test on small inputs before applying a pattern to your production data.

The patterns and techniques in this guide cover the vast majority of what you will encounter in daily development. Bookmark it, revisit it when you encounter a tricky pattern, and most importantly, practice. Every regex you write makes the next one easier.

⚙ Start practicing: Test patterns in the Regex Tester, run find-and-replace with the Regex Replace tool, debug complex expressions in the Regex Debugger, or browse ready-made patterns in the Regex Library.

Learn More

Regex Find and Replace: The Complete Guide — deep dive into replacement patterns, backreferences, and 20+ practical recipes for editors, command line, and code
Regex Cheat Sheet — one-page quick reference for all metacharacters, quantifiers, flags, and shorthand classes
Interactive Regex Cheatsheet Tool — searchable cheatsheet with live examples you can edit