⬅ Previous Next ➡

Regular Expressions

Regex Basics (Regular Expressions)
  • Regex is a pattern language used to search, match, and replace text.
  • Used in validation (email/phone), text processing, log analysis, scraping, etc.
  • Python provides regex via re module.
import re

text = "I love Python"
m = re.search("Python", text)
print(m.group())
re Module (Common Functions)
  • re.search() → finds first match anywhere in string
  • re.match() → match only at start of string
  • re.findall() → returns list of all matches
  • re.finditer() → returns iterator of match objects
  • re.sub() → replace matches
  • re.split() → split string using pattern
import re

s = "ab12 cd34 ef56"
print(re.findall(r"\d+", s))          # ['12','34','56']
print(re.sub(r"\d+", "#", s))         # ab# cd# ef#
Pattern Matching (search, match, fullmatch)
  • search: match anywhere
  • match: match from beginning
  • fullmatch: whole string must match pattern
import re

text = "Python is easy"
print(re.search(r"easy", text).group())
print(re.match(r"Python", text).group())
print(re.fullmatch(r"Python is easy", text) is not None)
Metacharacters (Core Symbols)
  • . any character except newline
  • ^ start of string
  • $ end of string
  • * 0 or more
  • + 1 or more
  • ? 0 or 1
  • {m,n} repeat count
  • [] character set
  • () group
  • | OR
import re

print(re.findall(r"a.", "a1 a2 ab ac"))   # ['a1','a2','ab','ac']
print(re.findall(r"^Hi", "Hi there"))     # ['Hi']
print(re.findall(r"end$", "the end"))     # ['end']
Character Sets and Ranges
  • [abc] any one of a/b/c
  • [a-z] lowercase letters
  • [0-9] digits
  • [^0-9] NOT digits
import re

s = "A1 b2 C3"
print(re.findall(r"[A-Z]", s))        # ['A', 'C']
print(re.findall(r"[0-9]", s))        # ['1', '2', '3']
print(re.findall(r"[^0-9\s]+", s))   # ['A', 'b', 'C']
Special Sequences
  • \d digit (0-9), \D non-digit
  • \w word char (a-zA-Z0-9_), \W non-word
  • \s whitespace, \S non-whitespace
  • \b word boundary
import re

t = "Email: test123@gmail.com"
print(re.findall(r"\w+", t))
print(re.findall(r"\d+", t))
print(re.findall(r"\btest\w+", t))
Groups and Capturing
  • Use () to capture parts of match.
  • Use group(1), group(2) to access captured values.
import re

date = "2026-01-16"
m = re.search(r"(\d{4})-(\d{2})-(\d{2})", date)

print(m.group(1))   # year
print(m.group(2))   # month
print(m.group(3))   # day
Practical Regex Example: Validate Mobile Number (India)
  • Validates 10-digit mobile number starting with 6-9.
  • Uses fullmatch to match complete string.
import re

mobile = input("Enter mobile: ").strip()

if re.fullmatch(r"[6-9]\d{9}", mobile):
    print("Valid mobile number")
else:
    print("Invalid mobile number")
Practical Regex Example: Validate Email
  • Simple email validation using regex.
  • Note: Real-world email rules are complex; this is exam-friendly.
import re

email = input("Enter email: ").strip()
pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"

if re.fullmatch(pattern, email):
    print("Valid email")
else:
    print("Invalid email")
Mini Project: Extract All Emails and Phones from Text
  • Finds all email IDs and mobile numbers from a paragraph.
  • Useful in data cleaning and text mining.
import re

text = """
Contact: souravshu562@gmail.com, admin@site.in
Phones: 8144305808, 9876543210
"""

emails = re.findall(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}", text)
phones = re.findall(r"\b[6-9]\d{9}\b", text)

print("Emails:", emails)
print("Phones:", phones)
⬅ Previous Next ➡