How do I recursively find all files matching a pattern with pathlib?

Use the rglob() method for recursive globbing. For example, Path('.').rglob('*.py') finds all Python files in the current directory and all subdirectories. This is equivalent to glob('**/*.py') but shorter. rglob() returns a generator, so it handles large directory trees efficiently without loading all results into memory at once.

What is the difference between Path.resolve() and Path.absolute()?

Path.resolve() returns the absolute path with all symlinks resolved and '..' components eliminated. It gives you the canonical, real filesystem path. Path.absolute() returns the absolute path without resolving symlinks or normalizing '..' segments. In most cases, resolve() is preferred because it gives a deterministic, fully normalized path. Use resolve(strict=True) to also verify that the path exists, raising FileNotFoundError if it does not.

Python pathlib: The Complete Guide for 2026

Q: What is pathlib and why should I use it instead of os.path?

pathlib is a Python standard library module (introduced in Python 3.4) that provides an object-oriented interface for filesystem paths. Instead of string manipulation with os.path.join() and os.path.basename(), pathlib uses Path objects with methods and operators like the / operator for joining paths. It produces cleaner, more readable code, handles cross-platform differences automatically, and combines path manipulation with file I/O in a single API. The Python documentation recommends pathlib for new code.

Q: Does pathlib work on both Windows and Linux?

Yes. When you use pathlib.Path(), Python automatically returns a PosixPath on Linux and macOS, or a WindowsPath on Windows. Path objects handle separator differences (forward slash vs backslash) transparently. You can use the / operator to join paths on any platform and pathlib converts them correctly. PurePosixPath and PureWindowsPath are available for working with paths from a different OS without filesystem access.

Q: Can I use pathlib Path objects with open() and other functions that expect strings?

Yes. Since Python 3.6, Path objects implement the os.fspath() protocol, which means they work everywhere a string path is accepted: open(), json.load(), csv.reader(), shutil.copy(), subprocess.run(), and most third-party libraries. You can also explicitly convert a Path to a string with str(path) if needed for older libraries that do not support the os.fspath() protocol.

Q: Is pathlib slower than os.path?

Path object creation has slightly more overhead than raw string operations with os.path because pathlib creates objects rather than returning strings. However, for real-world use cases the difference is negligible since filesystem I/O (disk access, network mounts) dominates execution time. In Python 3.12 and later, pathlib performance has been significantly improved. Only in extreme scenarios processing millions of paths in tight loops would the overhead matter, and even then you can batch operations or use os.scandir() for the hot path.

February 12, 2026 · 18 min read

If you are still joining file paths with os.path.join() and splitting extensions with os.path.splitext(), it is time to upgrade. Python's pathlib module, part of the standard library since Python 3.4, replaces clunky string-based path manipulation with an elegant, object-oriented API. Paths become objects with methods, operators, and properties — making your code shorter, more readable, and less error-prone.

⚙ Related tools: Format config files with our JSON Formatter, reference Python syntax with the Python Cheat Sheet, and set up project environments with our Virtual Environments Guide.

Why pathlib Over os.path
Creating Path Objects
Path Components
Path Manipulation
File Operations
Directory Operations
File Metadata and Checks
Path Resolution and Relative Paths
os.path to pathlib Migration Cheat Sheet
Real-World Patterns
pathlib with Other Libraries
Performance Considerations
Frequently Asked Questions

1. Why pathlib Over os.path

The os.path module treats paths as raw strings. You call standalone functions, pass strings around, and hope the slashes are right. pathlib wraps paths in objects with methods, properties, and operator overloads:

# os.path approach — string manipulation everywhere
import os
base = os.path.dirname(os.path.abspath(__file__))
config = os.path.join(base, "config", "settings.json")
name = os.path.splitext(os.path.basename(config))[0]

# pathlib approach — clean, object-oriented
from pathlib import Path
base = Path(__file__).resolve().parent
config = base / "config" / "settings.json"
name = config.stem

Key advantages of pathlib:

Operator overloading — the / operator joins paths, replacing nested os.path.join() calls
Properties instead of functions — .name, .stem, .suffix, .parent are attributes
Built-in I/O — read_text(), write_text() eliminate open() context managers for simple operations
Cross-platform — Path() automatically returns the correct type for your OS
Standard library — no installation needed, available in Python 3.4+

2. Creating Path Objects

from pathlib import Path, PurePosixPath, PureWindowsPath

# Basic construction
p = Path("src/models/user.py")
p = Path("/etc/nginx/nginx.conf")
p = Path("src", "models", "user.py")  # auto-joined

# Current directory and home directory
cwd = Path.cwd()           # /home/dev/projects/myapp
home = Path.home()          # /home/dev

# From __file__ (common in scripts)
project_root = Path(__file__).resolve().parent.parent

# From environment variables
import os
data_dir = Path(os.environ.get("DATA_DIR", "/tmp/data"))

# Expand user home directory
p = Path("~/documents/notes.txt").expanduser()

Path Class Hierarchy

from pathlib import PurePosixPath, PureWindowsPath, Path

# PurePaths work without filesystem access — manipulate
# paths from another OS without touching disk
win_path = PureWindowsPath(r"C:\Users\dev\project\main.py")
print(win_path.name)    # main.py
print(win_path.drive)   # C:

# Path() returns the concrete type for your OS
p = Path(".")  # PosixPath on Linux, WindowsPath on Windows

3. Path Components

Every Path object exposes its components as properties:

from pathlib import Path
p = Path("/home/dev/projects/myapp/src/main.py")

p.name        # 'main.py'         — filename with extension
p.stem        # 'main'            — filename without extension
p.suffix      # '.py'             — file extension (with dot)
p.suffixes    # ['.py']           — all extensions (e.g., ['.tar', '.gz'])
p.parent      # Path('/home/dev/projects/myapp/src')
p.parents[1]  # Path('/home/dev/projects/myapp')
p.parts       # ('/', 'home', 'dev', 'projects', 'myapp', 'src', 'main.py')
p.anchor      # '/'               — root + drive (just '/' on Unix)

# Multi-extension files
archive = Path("/backups/data.tar.gz")
archive.suffix     # '.gz'           — last extension only
archive.suffixes   # ['.tar', '.gz'] — all extensions
archive.stem       # 'data.tar'      — name minus last suffix

Iterating Over Parents

from pathlib import Path

# Find a project root by looking for a marker file
def find_project_root(start: Path) -> Path | None:
    for parent in [start, *start.parents]:
        if (parent / "pyproject.toml").exists():
            return parent
    return None

root = find_project_root(Path.cwd())

4. Path Manipulation

Path objects are immutable. Every manipulation returns a new Path:

from pathlib import Path
p = Path("src/models/user.py")

# Join paths with the / operator
config = Path("project") / "config" / "settings.json"
log = Path("/var/log") / "nginx" / "access.log"

# Change the filename, extension, or stem
p.with_name("account.py")      # Path('src/models/account.py')
p.with_suffix(".pyi")           # Path('src/models/user.pyi')
p.with_suffix("")               # Path('src/models/user')
p.with_stem("admin")            # Path('src/models/admin.py')  (3.12+)

# Build paths dynamically
from datetime import date
base = Path("output")
report = base / date.today().isoformat() / "report.html"
# Path('output/2026-02-12/report.html')

# Build from a list of segments
parts = ["data", "processed", "2026", "q1", "results.csv"]
path = Path(*parts)

5. File Operations

pathlib includes methods for common file I/O, eliminating open() for simple cases:

from pathlib import Path

config = Path("config.json")

# Read/write text
content = config.read_text(encoding="utf-8")
config.write_text('{"debug": true}', encoding="utf-8")

# Read/write binary data
data = Path("logo.png").read_bytes()
Path("logo_copy.png").write_bytes(data)

Working with JSON

from pathlib import Path
import json

config_path = Path("config/settings.json")

# Read JSON — two approaches
config = json.loads(config_path.read_text(encoding="utf-8"))
with open(config_path, encoding="utf-8") as f:  # Path works with open()
    config = json.load(f)

# Write JSON
config["version"] = "2.0"
config_path.write_text(json.dumps(config, indent=2), encoding="utf-8")

File Copy, Move, and Delete

from pathlib import Path
import shutil

src, dst = Path("data/input.csv"), Path("backup/input.csv")

shutil.copy2(src, dst)                            # copy with metadata
shutil.copytree(Path("project"), Path("backup"))   # copy directory tree

src.rename(dst)                  # move within same filesystem
src.replace(dst)                 # overwrite dst if it exists

Path("temp.txt").unlink(missing_ok=True)  # delete file (Python 3.8+)
Path("empty_dir").rmdir()                 # delete empty directory
shutil.rmtree(Path("build"))              # delete directory with contents

6. Directory Operations

from pathlib import Path

project = Path("myapp")
project.mkdir(exist_ok=True)                               # create directory
(project / "src" / "models").mkdir(parents=True, exist_ok=True)  # nested

# List directory contents (non-recursive)
for item in project.iterdir():
    print(item.name, "dir" if item.is_dir() else "file")

Glob Patterns

from pathlib import Path
src = Path("src")

list(src.glob("*.py"))           # Python files in src/
list(src.glob("test_*.py"))      # test files in src/
list(src.rglob("*.py"))          # all Python files recursively
list(src.rglob("*.html"))        # all HTML files, any depth
list(src.glob("*/models/*.py"))  # Python files in any models/ subdir

# rglob("*.py") is shorthand for glob("**/*.py")
python_files = sorted(src.rglob("*.py"))
print(f"Found {len(python_files)} Python files")

Filtering Directory Contents

from pathlib import Path
import time

project = Path(".")
files = [f for f in project.iterdir() if f.is_file()]
dirs  = [d for d in project.iterdir() if d.is_dir()]

# Files larger than 1 MB
big = [f for f in project.rglob("*")
       if f.is_file() and f.stat().st_size > 1_000_000]

# Files modified in the last 24 hours
cutoff = time.time() - 86400
recent = [f for f in project.rglob("*")
          if f.is_file() and f.stat().st_mtime > cutoff]

7. File Metadata and Checks

from pathlib import Path
from datetime import datetime

p = Path("src/main.py")

# Existence and type checks
p.exists()       # True if path exists
p.is_file()      # True if regular file
p.is_dir()       # True if directory
p.is_symlink()   # True if symbolic link

# File statistics
stat = p.stat()
stat.st_size     # file size in bytes
stat.st_mtime    # last modification time (Unix timestamp)

# Human-readable modification time
mtime = datetime.fromtimestamp(p.stat().st_mtime)
print(f"Modified: {mtime:%Y-%m-%d %H:%M}")

# Ownership (Unix only)
p.owner()   # 'dev'
p.group()   # 'dev'

# Permissions
import stat as stat_mod
p.chmod(0o755)   # rwxr-xr-x (make executable)
p.chmod(0o644)   # rw-r--r-- (read-only for others)

8. Path Resolution and Relative Paths

from pathlib import Path

# resolve() — absolute path with symlinks resolved and '..' eliminated
p = Path("src/../config/settings.json")
p.resolve()   # Path('/home/dev/projects/myapp/config/settings.json')

# resolve(strict=True) — also verifies the path exists
try:
    p.resolve(strict=True)
except FileNotFoundError:
    print("Path does not exist")

# absolute() — absolute path without resolving symlinks
Path("config/settings.json").absolute()

# relative_to() — make a path relative to another
full = Path("/home/dev/projects/myapp/src/main.py")
base = Path("/home/dev/projects/myapp")
full.relative_to(base)   # Path('src/main.py')

# is_relative_to() — check without raising (Python 3.9+)
Path("/home/dev/file.txt").is_relative_to("/home")  # True
Path("/etc/nginx.conf").is_relative_to("/home")      # False

Path("/etc/hosts").is_absolute()       # True
Path("config/app.toml").is_absolute()  # False

Safe Path Construction

from pathlib import Path

def safe_join(base: Path, user_input: str) -> Path:
    """Join paths safely, preventing directory traversal attacks."""
    result = (base / user_input).resolve()
    if not result.is_relative_to(base.resolve()):
        raise ValueError(f"Path traversal detected: {user_input}")
    return result

safe_join(Path("/var/data"), "reports/q1.csv")       # OK
safe_join(Path("/var/data"), "../../etc/passwd")      # ValueError

9. os.path to pathlib Migration Cheat Sheet

Every common os.path and os operation mapped to its pathlib equivalent:

os / os.path	pathlib equivalent
os.path.join(a, b)	Path(a) / b
os.path.basename(p)	path.name
os.path.dirname(p)	path.parent
os.path.splitext(p)	path.stem, path.suffix
os.path.abspath(p)	path.absolute()
os.path.realpath(p)	path.resolve()
os.path.relpath(p, start)	path.relative_to(start)
os.path.exists(p)	path.exists()
os.path.isfile(p)	path.is_file()
os.path.isdir(p)	path.is_dir()
os.path.islink(p)	path.is_symlink()
os.path.getsize(p)	path.stat().st_size
os.path.getmtime(p)	path.stat().st_mtime
os.path.expanduser(p)	path.expanduser()
os.getcwd()	Path.cwd()
os.listdir(p)	path.iterdir()
os.mkdir(p)	path.mkdir()
os.makedirs(p)	path.mkdir(parents=True)
os.rename(src, dst)	path.rename(dst)
os.remove(p)	path.unlink()
os.rmdir(p)	path.rmdir()
os.stat(p)	path.stat()
os.chmod(p, mode)	path.chmod(mode)
glob.glob(pattern)	path.glob(pattern)
glob.glob(recursive=True)	path.rglob(pattern)
open(p, "r")	path.read_text()
open(p, "w").write()	path.write_text(data)
open(p, "rb")	path.read_bytes()

10. Real-World Patterns

Loading Config Files Relative to the Script

from pathlib import Path
import json

BASE_DIR = Path(__file__).resolve().parent

def load_config(name: str = "config.json") -> dict:
    config_path = BASE_DIR / "config" / name
    if not config_path.exists():
        raise FileNotFoundError(f"Config not found: {config_path}")
    return json.loads(config_path.read_text(encoding="utf-8"))

config = load_config()
db_url = config["database"]["url"]

Batch File Processing

from pathlib import Path

input_dir = Path("data/raw")
output_dir = Path("data/processed")
output_dir.mkdir(parents=True, exist_ok=True)

for csv_file in input_dir.rglob("*.csv"):
    # Mirror the directory structure in output
    relative = csv_file.relative_to(input_dir)
    output_path = output_dir / relative.with_suffix(".json")
    output_path.parent.mkdir(parents=True, exist_ok=True)

    content = csv_file.read_text(encoding="utf-8")
    processed = transform(content)  # your processing function
    output_path.write_text(processed, encoding="utf-8")

Cleaning Up Build Artifacts

from pathlib import Path
import shutil

project = Path(".")

# Remove all __pycache__ directories
for cache_dir in project.rglob("__pycache__"):
    shutil.rmtree(cache_dir)

# Remove all .pyc files
for pyc in project.rglob("*.pyc"):
    pyc.unlink()

# Remove build artifacts
for pattern in ["build", "dist", "*.egg-info"]:
    for match in project.glob(pattern):
        shutil.rmtree(match)

Walking a Project Tree (Python 3.12+)

from pathlib import Path

# Path.walk() replaces os.walk() with native pathlib support
for dirpath, dirnames, filenames in Path("src").walk():
    dirnames[:] = [d for d in dirnames if not d.startswith(".")]
    for filename in filenames:
        filepath = dirpath / filename
        print(filepath)

11. pathlib with Other Libraries

Since Python 3.6, Path objects work anywhere a string path is accepted:

from pathlib import Path
import json, csv, shutil, subprocess

# open(), json, csv — all accept Path objects directly
with open(Path("data/users.json"), encoding="utf-8") as f:
    users = json.load(f)

csv_path = Path("data/export.csv")
with open(csv_path, "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["name", "email"])

# shutil — copy, move, archive
shutil.copy2(Path("file.txt"), Path("backup/file.txt"))
shutil.copytree(Path("project"), Path("project_backup"))
shutil.make_archive("release", "zip", root_dir=Path("dist"))

# subprocess
result = subprocess.run(
    [str(Path("scripts/deploy.sh")), "--env", "production"],
    cwd=Path("project"), capture_output=True, text=True
)

Pydantic Integration

from pathlib import Path
from pydantic import BaseModel, field_validator

class AppConfig(BaseModel):
    data_dir: Path
    log_file: Path

    @field_validator("data_dir")
    @classmethod
    def dir_must_exist(cls, v: Path) -> Path:
        if not v.exists():
            raise ValueError(f"Directory does not exist: {v}")
        return v.resolve()

# Pydantic coerces strings to Path objects automatically
config = AppConfig(data_dir="/var/data", log_file="app.log")

See our Pydantic Complete Guide for more on integrating Path with data validation.

Downloading Files with httpx

from pathlib import Path
import httpx

output = Path("downloads/data.csv")
output.parent.mkdir(parents=True, exist_ok=True)

response = httpx.get("https://example.com/data.csv")
output.write_bytes(response.content)
print(f"Downloaded {output.stat().st_size} bytes to {output}")

See our Python requests and httpx guide for more HTTP patterns.

12. Performance Considerations

Path objects have more overhead than raw strings, but it rarely matters:

from pathlib import Path
import os.path, timeit

# Object creation: pathlib is ~3-5x slower than os.path.join
# But we are talking microseconds, not milliseconds
timeit.timeit(lambda: Path("a") / "b" / "c.txt", number=100_000)
# ~0.15 seconds for 100K iterations
timeit.timeit(lambda: os.path.join("a", "b", "c.txt"), number=100_000)
# ~0.04 seconds for 100K iterations

# For file I/O, the difference is ZERO — disk access dominates

# When you need maximum performance for large directory scanning,
# os.scandir() avoids creating Path objects per entry:
import os
with os.scandir("/var/log") as entries:
    log_files = [e.name for e in entries if e.is_file()]

# But for normal use, pathlib is the right default:
log_files = [f.name for f in Path("/var/log").iterdir() if f.is_file()]

Bottom line: Only in extreme scenarios processing millions of paths in tight loops does the overhead matter. For everything else, choose pathlib for readability. Python 3.12+ includes significant pathlib performance improvements including the native Path.walk() method.

Frequently Asked Questions

What is pathlib and why should I use it instead of os.path?

pathlib is a standard library module (Python 3.4+) that provides an object-oriented interface for filesystem paths. Instead of calling os.path.join() and os.path.basename(), you use Path objects with properties (.name, .stem, .parent) and operators (/ for joining). It produces cleaner, more readable code and combines path manipulation with file I/O in a single API. The Python documentation recommends pathlib for new code.

Does pathlib work on both Windows and Linux?

Yes. Path() automatically returns a PosixPath on Linux/macOS or a WindowsPath on Windows. Path objects handle separator differences transparently. The / operator works for joining paths on any platform. Use PurePosixPath and PureWindowsPath for working with paths from another OS without filesystem access.

Can I use pathlib Path objects with open() and other functions?

Yes. Since Python 3.6, Path objects implement the os.fspath() protocol, so they work everywhere a string path is accepted: open(), json.load(), csv.reader(), shutil.copy(), subprocess.run(), and most third-party libraries. You can also convert to a string with str(path) for older libraries.

How do I recursively find all files matching a pattern?

Use the rglob() method. For example, Path('.').rglob('*.py') finds all Python files in the current directory and all subdirectories. This is equivalent to glob('**/*.py'). The method returns a generator, so it handles large directory trees efficiently without loading all results into memory.

What is the difference between resolve() and absolute()?

resolve() returns the absolute path with symlinks resolved and .. components eliminated — the canonical, real filesystem path. absolute() returns the absolute path without resolving symlinks or normalizing .. segments. Use resolve() in most cases for a deterministic, normalized path. Use resolve(strict=True) to also verify the path exists.

Is pathlib slower than os.path?

Path object creation is 3–5x slower than raw os.path string operations in microbenchmarks. However, the difference is microseconds per operation — negligible in real applications where filesystem I/O dominates. Python 3.12+ includes significant pathlib performance improvements. Only in extreme scenarios processing millions of paths in tight loops would the overhead matter; for everything else, choose pathlib for readability.

Conclusion

pathlib is the modern standard for working with filesystem paths in Python. The / operator for joining, properties like .stem and .suffix for components, and built-in methods like read_text() and glob() make your code shorter and more expressive. Every os.path pattern has a cleaner pathlib equivalent, and since Python 3.6 Path objects work with virtually every function that accepts file paths.

Start by replacing os.path.join() calls in your existing code with the / operator. Then adopt read_text() and write_text() for simple file I/O. Before long, you will never reach for os.path again.

⚙ Keep learning: Validate data structures with our Pydantic Complete Guide, manage project dependencies with the Virtual Environments Guide, and make HTTP requests with our Python requests & httpx Guide.