Python正規表現実践ガイド：reモジュールの基本からパフォーマンス最適化まで

はじめに

正規表現はテキスト処理の強力なツールです。Python の re モジュールは Perl 互換の正規表現をサポートしており、ログ解析、データクレンジング、バリデーションなど幅広い場面で活用できます。

基本パターン

よく使うメタ文字

パターン	意味	例
`.`	任意の1文字	`a.c` → “abc”, “a1c”
`\d`	数字 `[0-9]`	`\d{3}` → “123”
`\w`	単語文字 `[a-zA-Z0-9_]`	`\w+` → “hello_42”
`\s`	空白文字	`\s+` → " “, “\t”
`^` / `$`	行頭 / 行末	`^Hello$`
`*` / `+` / `?`	0回以上 / 1回以上 / 0-1回	`ab*c` → “ac”, “abc”
`{n,m}`	n〜m回	`\d{2,4}` → “12”, “1234”

基本操作

import re

text = "2026-02-26 Error: Connection timeout (retry: 3)"

# match: 先頭からマッチ
m = re.match(r'\d{4}-\d{2}-\d{2}', text)
print(m.group())  # "2026-02-26"

# search: 最初のマッチを検索
m = re.search(r'retry: (\d+)', text)
print(m.group(1))  # "3"

# findall: すべてのマッチをリストで返す
numbers = re.findall(r'\d+', text)
print(numbers)  # ['2026', '02', '26', '3']

# sub: 置換
cleaned = re.sub(r'\d{4}-\d{2}-\d{2}', '[DATE]', text)
print(cleaned)  # "[DATE] Error: Connection timeout (retry: 3)"

グループとキャプチャ

名前付きグループ

log = "2026-02-26 14:30:45 [ERROR] Database connection failed"

pattern = r'(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<message>.+)'
m = re.match(pattern, log)

if m:
    print(m.group('date'))     # "2026-02-26"
    print(m.group('level'))    # "ERROR"
    print(m.group('message'))  # "Database connection failed"
    print(m.groupdict())       # {'date': '2026-02-26', 'time': '14:30:45', ...}

非キャプチャグループ

# (?:...) はグループ化するがキャプチャしない
pattern = r'(?:https?|ftp)://[\w./\-]+'
urls = re.findall(pattern, "Visit https://example.com or ftp://files.example.com")
print(urls)  # ['https://example.com', 'ftp://files.example.com']

先読みと後読み

肯定先読み / 否定先読み

# 肯定先読み: (?=...) — 後に続くがマッチには含まない
passwords = ["abc123", "password", "Str0ng!Pass", "12345"]
strong = [p for p in passwords
          if re.match(r'(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}', p)]
print(strong)  # ['Str0ng!Pass']

# 否定先読み: (?!...) — 後に続かない
# "test" で始まらない行
lines = ["test_func", "main_func", "test_class", "helper"]
non_test = [l for l in lines if re.match(r'(?!test)\w+', l)]
print(non_test)  # ['main_func', 'helper']

肯定後読み / 否定後読み

# 肯定後読み: (?<=...)
text = "Price: $100, Tax: $8, Total: $108"
amounts = re.findall(r'(?<=\$)\d+', text)
print(amounts)  # ['100', '8', '108']

# 否定後読み: (?<!...)
text = "v1.0 v2.0-beta v3.0 v4.0-rc1"
stable = re.findall(r'v[\d.]+(?!-)', text)
print(stable)  # ['v1.0', 'v3.0']

実践パターン集

メールアドレスの抽出

text = "Contact us at info@example.com or support@test.co.jp"
pattern = r'[\w.+-]+@[\w-]+\.[\w.]+'
emails = re.findall(pattern, text)
print(emails)  # ['info@example.com', 'support@test.co.jp']

CSV の安全な分割

# カンマ区切りだがクォート内のカンマは無視
line = 'John,"Doe, Jr.",30,"New York, NY"'
pattern = r',(?=(?:[^"]*"[^"]*")*[^"]*$)'
fields = re.split(pattern, line)
print(fields)  # ['John', '"Doe, Jr."', '30', '"New York, NY"']

IPアドレスの検証

def is_valid_ipv4(ip):
    pattern = r'^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$'
    return bool(re.match(pattern, ip))

print(is_valid_ipv4("192.168.1.1"))   # True
print(is_valid_ipv4("256.1.1.1"))     # False

パフォーマンス最適化

コンパイル済みパターン

同じパターンを繰り返し使う場合は re.compile() でプリコンパイルします。

# 非効率: ループ内で毎回コンパイル
for line in lines:
    re.search(r'\d{4}-\d{2}-\d{2}', line)

# 効率的: 事前コンパイル
date_pattern = re.compile(r'\d{4}-\d{2}-\d{2}')
for line in lines:
    date_pattern.search(line)

貪欲マッチ vs 非貪欲マッチ

html = '<div>Hello</div><div>World</div>'

# 貪欲（デフォルト）: 最長一致
print(re.findall(r'<div>.*</div>', html))
# ['<div>Hello</div><div>World</div>']

# 非貪欲: 最短一致（?を追加）
print(re.findall(r'<div>.*?</div>', html))
# ['<div>Hello</div>', '<div>World</div>']

バックトラック爆発の回避

# 危険: 壊滅的バックトラッキング（ReDoS）
# re.match(r'(a+)+b', 'a' * 30)  # 非常に遅い

# 安全: アトミックグループ的な書き方
# re.match(r'a+b', 'a' * 30)  # 即座に判定

re モジュールの主要フラグ

フラグ	説明
`re.IGNORECASE` (`re.I`)	大文字小文字を区別しない
`re.MULTILINE` (`re.M`)	`^`/`$` が各行に適用
`re.DOTALL` (`re.S`)	`.` が改行にもマッチ
`re.VERBOSE` (`re.X`)	コメントや空白を許可

pattern = re.compile(r'''
    (?P<year>\d{4})   # 年
    -(?P<month>\d{2}) # 月
    -(?P<day>\d{2})   # 日
''', re.VERBOSE)

Pythonデコレータ完全ガイド - デコレータと正規表現を組み合わせたバリデーションパターンが実用的です。
Python asyncio入門 - 非同期テキスト処理における正規表現の活用法です。
Pythonでプログレスバーを自作する - Pythonの実践的なTipsを紹介しています。
Matplotlib実践Tips：論文品質のグラフを作る - Python実践Tipsとして可視化のベストプラクティスを紹介しています。

参考文献

Python re module documentation
Friedl, J. E. F. (2006). Mastering Regular Expressions (3rd ed.). O’Reilly Media.
Regular Expressions 101 - オンラインテストツール
DevToolBox 正規表現テスター - ブラウザ上でリアルタイムに正規表現をテストできる無料ツールです。

はじめに

基本パターン

よく使うメタ文字

基本操作

グループとキャプチャ

名前付きグループ

非キャプチャグループ

先読みと後読み

肯定先読み / 否定先読み

肯定後読み / 否定後読み

実践パターン集

メールアドレスの抽出

CSV の安全な分割

IPアドレスの検証

パフォーマンス最適化

コンパイル済みパターン

貪欲マッチ vs 非貪欲マッチ

バックトラック爆発の回避

re モジュールの主要フラグ

関連記事

参考文献

関連記事

ヒルベルト変換と解析信号：瞬時振幅・位相・周波数のPython実装

LSTMによる時系列予測：理論とPython実装

ベッセルフィルタの設計原理とPython実装