Python pathlib: The Complete Guide for 2026
If you are still joining file paths with os.path.join() and splitting extensions with os.path.splitext(), it is time to upgrade. Python's pathlib module, part of the standard library since Python 3.4, replaces clunky string-based path manipulation with an elegant, object-oriented API. Paths become objects with methods, operators, and properties — making your code shorter, more readable, and less error-prone.
Table of Contents
- Why pathlib Over os.path
- Creating Path Objects
- Path Components
- Path Manipulation
- File Operations
- Directory Operations
- File Metadata and Checks
- Path Resolution and Relative Paths
- os.path to pathlib Migration Cheat Sheet
- Real-World Patterns
- pathlib with Other Libraries
- Performance Considerations
- Frequently Asked Questions
1. Why pathlib Over os.path
The os.path module treats paths as raw strings. You call standalone functions, pass strings around, and hope the slashes are right. pathlib wraps paths in objects with methods, properties, and operator overloads:
# os.path approach — string manipulation everywhere
import os
base = os.path.dirname(os.path.abspath(__file__))
config = os.path.join(base, "config", "settings.json")
name = os.path.splitext(os.path.basename(config))[0]
# pathlib approach — clean, object-oriented
from pathlib import Path
base = Path(__file__).resolve().parent
config = base / "config" / "settings.json"
name = config.stem
Key advantages of pathlib:
- Operator overloading — the
/operator joins paths, replacing nestedos.path.join()calls - Properties instead of functions —
.name,.stem,.suffix,.parentare attributes - Built-in I/O —
read_text(),write_text()eliminateopen()context managers for simple operations - Cross-platform —
Path()automatically returns the correct type for your OS - Standard library — no installation needed, available in Python 3.4+
2. Creating Path Objects
from pathlib import Path, PurePosixPath, PureWindowsPath
# Basic construction
p = Path("src/models/user.py")
p = Path("/etc/nginx/nginx.conf")
p = Path("src", "models", "user.py") # auto-joined
# Current directory and home directory
cwd = Path.cwd() # /home/dev/projects/myapp
home = Path.home() # /home/dev
# From __file__ (common in scripts)
project_root = Path(__file__).resolve().parent.parent
# From environment variables
import os
data_dir = Path(os.environ.get("DATA_DIR", "/tmp/data"))
# Expand user home directory
p = Path("~/documents/notes.txt").expanduser()
Path Class Hierarchy
from pathlib import PurePosixPath, PureWindowsPath, Path
# PurePaths work without filesystem access — manipulate
# paths from another OS without touching disk
win_path = PureWindowsPath(r"C:\Users\dev\project\main.py")
print(win_path.name) # main.py
print(win_path.drive) # C:
# Path() returns the concrete type for your OS
p = Path(".") # PosixPath on Linux, WindowsPath on Windows
3. Path Components
Every Path object exposes its components as properties:
from pathlib import Path
p = Path("/home/dev/projects/myapp/src/main.py")
p.name # 'main.py' — filename with extension
p.stem # 'main' — filename without extension
p.suffix # '.py' — file extension (with dot)
p.suffixes # ['.py'] — all extensions (e.g., ['.tar', '.gz'])
p.parent # Path('/home/dev/projects/myapp/src')
p.parents[1] # Path('/home/dev/projects/myapp')
p.parts # ('/', 'home', 'dev', 'projects', 'myapp', 'src', 'main.py')
p.anchor # '/' — root + drive (just '/' on Unix)
# Multi-extension files
archive = Path("/backups/data.tar.gz")
archive.suffix # '.gz' — last extension only
archive.suffixes # ['.tar', '.gz'] — all extensions
archive.stem # 'data.tar' — name minus last suffix
Iterating Over Parents
from pathlib import Path
# Find a project root by looking for a marker file
def find_project_root(start: Path) -> Path | None:
for parent in [start, *start.parents]:
if (parent / "pyproject.toml").exists():
return parent
return None
root = find_project_root(Path.cwd())
4. Path Manipulation
Path objects are immutable. Every manipulation returns a new Path:
from pathlib import Path
p = Path("src/models/user.py")
# Join paths with the / operator
config = Path("project") / "config" / "settings.json"
log = Path("/var/log") / "nginx" / "access.log"
# Change the filename, extension, or stem
p.with_name("account.py") # Path('src/models/account.py')
p.with_suffix(".pyi") # Path('src/models/user.pyi')
p.with_suffix("") # Path('src/models/user')
p.with_stem("admin") # Path('src/models/admin.py') (3.12+)
# Build paths dynamically
from datetime import date
base = Path("output")
report = base / date.today().isoformat() / "report.html"
# Path('output/2026-02-12/report.html')
# Build from a list of segments
parts = ["data", "processed", "2026", "q1", "results.csv"]
path = Path(*parts)
5. File Operations
pathlib includes methods for common file I/O, eliminating open() for simple cases:
from pathlib import Path
config = Path("config.json")
# Read/write text
content = config.read_text(encoding="utf-8")
config.write_text('{"debug": true}', encoding="utf-8")
# Read/write binary data
data = Path("logo.png").read_bytes()
Path("logo_copy.png").write_bytes(data)
Working with JSON
from pathlib import Path
import json
config_path = Path("config/settings.json")
# Read JSON — two approaches
config = json.loads(config_path.read_text(encoding="utf-8"))
with open(config_path, encoding="utf-8") as f: # Path works with open()
config = json.load(f)
# Write JSON
config["version"] = "2.0"
config_path.write_text(json.dumps(config, indent=2), encoding="utf-8")
File Copy, Move, and Delete
from pathlib import Path
import shutil
src, dst = Path("data/input.csv"), Path("backup/input.csv")
shutil.copy2(src, dst) # copy with metadata
shutil.copytree(Path("project"), Path("backup")) # copy directory tree
src.rename(dst) # move within same filesystem
src.replace(dst) # overwrite dst if it exists
Path("temp.txt").unlink(missing_ok=True) # delete file (Python 3.8+)
Path("empty_dir").rmdir() # delete empty directory
shutil.rmtree(Path("build")) # delete directory with contents
6. Directory Operations
from pathlib import Path
project = Path("myapp")
project.mkdir(exist_ok=True) # create directory
(project / "src" / "models").mkdir(parents=True, exist_ok=True) # nested
# List directory contents (non-recursive)
for item in project.iterdir():
print(item.name, "dir" if item.is_dir() else "file")
Glob Patterns
from pathlib import Path
src = Path("src")
list(src.glob("*.py")) # Python files in src/
list(src.glob("test_*.py")) # test files in src/
list(src.rglob("*.py")) # all Python files recursively
list(src.rglob("*.html")) # all HTML files, any depth
list(src.glob("*/models/*.py")) # Python files in any models/ subdir
# rglob("*.py") is shorthand for glob("**/*.py")
python_files = sorted(src.rglob("*.py"))
print(f"Found {len(python_files)} Python files")
Filtering Directory Contents
from pathlib import Path
import time
project = Path(".")
files = [f for f in project.iterdir() if f.is_file()]
dirs = [d for d in project.iterdir() if d.is_dir()]
# Files larger than 1 MB
big = [f for f in project.rglob("*")
if f.is_file() and f.stat().st_size > 1_000_000]
# Files modified in the last 24 hours
cutoff = time.time() - 86400
recent = [f for f in project.rglob("*")
if f.is_file() and f.stat().st_mtime > cutoff]
7. File Metadata and Checks
from pathlib import Path
from datetime import datetime
p = Path("src/main.py")
# Existence and type checks
p.exists() # True if path exists
p.is_file() # True if regular file
p.is_dir() # True if directory
p.is_symlink() # True if symbolic link
# File statistics
stat = p.stat()
stat.st_size # file size in bytes
stat.st_mtime # last modification time (Unix timestamp)
# Human-readable modification time
mtime = datetime.fromtimestamp(p.stat().st_mtime)
print(f"Modified: {mtime:%Y-%m-%d %H:%M}")
# Ownership (Unix only)
p.owner() # 'dev'
p.group() # 'dev'
# Permissions
import stat as stat_mod
p.chmod(0o755) # rwxr-xr-x (make executable)
p.chmod(0o644) # rw-r--r-- (read-only for others)
8. Path Resolution and Relative Paths
from pathlib import Path
# resolve() — absolute path with symlinks resolved and '..' eliminated
p = Path("src/../config/settings.json")
p.resolve() # Path('/home/dev/projects/myapp/config/settings.json')
# resolve(strict=True) — also verifies the path exists
try:
p.resolve(strict=True)
except FileNotFoundError:
print("Path does not exist")
# absolute() — absolute path without resolving symlinks
Path("config/settings.json").absolute()
# relative_to() — make a path relative to another
full = Path("/home/dev/projects/myapp/src/main.py")
base = Path("/home/dev/projects/myapp")
full.relative_to(base) # Path('src/main.py')
# is_relative_to() — check without raising (Python 3.9+)
Path("/home/dev/file.txt").is_relative_to("/home") # True
Path("/etc/nginx.conf").is_relative_to("/home") # False
Path("/etc/hosts").is_absolute() # True
Path("config/app.toml").is_absolute() # False
Safe Path Construction
from pathlib import Path
def safe_join(base: Path, user_input: str) -> Path:
"""Join paths safely, preventing directory traversal attacks."""
result = (base / user_input).resolve()
if not result.is_relative_to(base.resolve()):
raise ValueError(f"Path traversal detected: {user_input}")
return result
safe_join(Path("/var/data"), "reports/q1.csv") # OK
safe_join(Path("/var/data"), "../../etc/passwd") # ValueError
9. os.path to pathlib Migration Cheat Sheet
Every common os.path and os operation mapped to its pathlib equivalent:
| os / os.path | pathlib equivalent |
|---|---|
| os.path.join(a, b) | Path(a) / b |
| os.path.basename(p) | path.name |
| os.path.dirname(p) | path.parent |
| os.path.splitext(p) | path.stem, path.suffix |
| os.path.abspath(p) | path.absolute() |
| os.path.realpath(p) | path.resolve() |
| os.path.relpath(p, start) | path.relative_to(start) |
| os.path.exists(p) | path.exists() |
| os.path.isfile(p) | path.is_file() |
| os.path.isdir(p) | path.is_dir() |
| os.path.islink(p) | path.is_symlink() |
| os.path.getsize(p) | path.stat().st_size |
| os.path.getmtime(p) | path.stat().st_mtime |
| os.path.expanduser(p) | path.expanduser() |
| os.getcwd() | Path.cwd() |
| os.listdir(p) | path.iterdir() |
| os.mkdir(p) | path.mkdir() |
| os.makedirs(p) | path.mkdir(parents=True) |
| os.rename(src, dst) | path.rename(dst) |
| os.remove(p) | path.unlink() |
| os.rmdir(p) | path.rmdir() |
| os.stat(p) | path.stat() |
| os.chmod(p, mode) | path.chmod(mode) |
| glob.glob(pattern) | path.glob(pattern) |
| glob.glob(recursive=True) | path.rglob(pattern) |
| open(p, "r") | path.read_text() |
| open(p, "w").write() | path.write_text(data) |
| open(p, "rb") | path.read_bytes() |
10. Real-World Patterns
Loading Config Files Relative to the Script
from pathlib import Path
import json
BASE_DIR = Path(__file__).resolve().parent
def load_config(name: str = "config.json") -> dict:
config_path = BASE_DIR / "config" / name
if not config_path.exists():
raise FileNotFoundError(f"Config not found: {config_path}")
return json.loads(config_path.read_text(encoding="utf-8"))
config = load_config()
db_url = config["database"]["url"]
Batch File Processing
from pathlib import Path
input_dir = Path("data/raw")
output_dir = Path("data/processed")
output_dir.mkdir(parents=True, exist_ok=True)
for csv_file in input_dir.rglob("*.csv"):
# Mirror the directory structure in output
relative = csv_file.relative_to(input_dir)
output_path = output_dir / relative.with_suffix(".json")
output_path.parent.mkdir(parents=True, exist_ok=True)
content = csv_file.read_text(encoding="utf-8")
processed = transform(content) # your processing function
output_path.write_text(processed, encoding="utf-8")
Cleaning Up Build Artifacts
from pathlib import Path
import shutil
project = Path(".")
# Remove all __pycache__ directories
for cache_dir in project.rglob("__pycache__"):
shutil.rmtree(cache_dir)
# Remove all .pyc files
for pyc in project.rglob("*.pyc"):
pyc.unlink()
# Remove build artifacts
for pattern in ["build", "dist", "*.egg-info"]:
for match in project.glob(pattern):
shutil.rmtree(match)
Walking a Project Tree (Python 3.12+)
from pathlib import Path
# Path.walk() replaces os.walk() with native pathlib support
for dirpath, dirnames, filenames in Path("src").walk():
dirnames[:] = [d for d in dirnames if not d.startswith(".")]
for filename in filenames:
filepath = dirpath / filename
print(filepath)
11. pathlib with Other Libraries
Since Python 3.6, Path objects work anywhere a string path is accepted:
from pathlib import Path
import json, csv, shutil, subprocess
# open(), json, csv — all accept Path objects directly
with open(Path("data/users.json"), encoding="utf-8") as f:
users = json.load(f)
csv_path = Path("data/export.csv")
with open(csv_path, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["name", "email"])
# shutil — copy, move, archive
shutil.copy2(Path("file.txt"), Path("backup/file.txt"))
shutil.copytree(Path("project"), Path("project_backup"))
shutil.make_archive("release", "zip", root_dir=Path("dist"))
# subprocess
result = subprocess.run(
[str(Path("scripts/deploy.sh")), "--env", "production"],
cwd=Path("project"), capture_output=True, text=True
)
Pydantic Integration
from pathlib import Path
from pydantic import BaseModel, field_validator
class AppConfig(BaseModel):
data_dir: Path
log_file: Path
@field_validator("data_dir")
@classmethod
def dir_must_exist(cls, v: Path) -> Path:
if not v.exists():
raise ValueError(f"Directory does not exist: {v}")
return v.resolve()
# Pydantic coerces strings to Path objects automatically
config = AppConfig(data_dir="/var/data", log_file="app.log")
See our Pydantic Complete Guide for more on integrating Path with data validation.
Downloading Files with httpx
from pathlib import Path
import httpx
output = Path("downloads/data.csv")
output.parent.mkdir(parents=True, exist_ok=True)
response = httpx.get("https://example.com/data.csv")
output.write_bytes(response.content)
print(f"Downloaded {output.stat().st_size} bytes to {output}")
See our Python requests and httpx guide for more HTTP patterns.
12. Performance Considerations
Path objects have more overhead than raw strings, but it rarely matters:
from pathlib import Path
import os.path, timeit
# Object creation: pathlib is ~3-5x slower than os.path.join
# But we are talking microseconds, not milliseconds
timeit.timeit(lambda: Path("a") / "b" / "c.txt", number=100_000)
# ~0.15 seconds for 100K iterations
timeit.timeit(lambda: os.path.join("a", "b", "c.txt"), number=100_000)
# ~0.04 seconds for 100K iterations
# For file I/O, the difference is ZERO — disk access dominates
# When you need maximum performance for large directory scanning,
# os.scandir() avoids creating Path objects per entry:
import os
with os.scandir("/var/log") as entries:
log_files = [e.name for e in entries if e.is_file()]
# But for normal use, pathlib is the right default:
log_files = [f.name for f in Path("/var/log").iterdir() if f.is_file()]
Bottom line: Only in extreme scenarios processing millions of paths in tight loops does the overhead matter. For everything else, choose pathlib for readability. Python 3.12+ includes significant pathlib performance improvements including the native Path.walk() method.
Frequently Asked Questions
What is pathlib and why should I use it instead of os.path?
pathlib is a standard library module (Python 3.4+) that provides an object-oriented interface for filesystem paths. Instead of calling os.path.join() and os.path.basename(), you use Path objects with properties (.name, .stem, .parent) and operators (/ for joining). It produces cleaner, more readable code and combines path manipulation with file I/O in a single API. The Python documentation recommends pathlib for new code.
Does pathlib work on both Windows and Linux?
Yes. Path() automatically returns a PosixPath on Linux/macOS or a WindowsPath on Windows. Path objects handle separator differences transparently. The / operator works for joining paths on any platform. Use PurePosixPath and PureWindowsPath for working with paths from another OS without filesystem access.
Can I use pathlib Path objects with open() and other functions?
Yes. Since Python 3.6, Path objects implement the os.fspath() protocol, so they work everywhere a string path is accepted: open(), json.load(), csv.reader(), shutil.copy(), subprocess.run(), and most third-party libraries. You can also convert to a string with str(path) for older libraries.
How do I recursively find all files matching a pattern?
Use the rglob() method. For example, Path('.').rglob('*.py') finds all Python files in the current directory and all subdirectories. This is equivalent to glob('**/*.py'). The method returns a generator, so it handles large directory trees efficiently without loading all results into memory.
What is the difference between resolve() and absolute()?
resolve() returns the absolute path with symlinks resolved and .. components eliminated — the canonical, real filesystem path. absolute() returns the absolute path without resolving symlinks or normalizing .. segments. Use resolve() in most cases for a deterministic, normalized path. Use resolve(strict=True) to also verify the path exists.
Is pathlib slower than os.path?
Path object creation is 3–5x slower than raw os.path string operations in microbenchmarks. However, the difference is microseconds per operation — negligible in real applications where filesystem I/O dominates. Python 3.12+ includes significant pathlib performance improvements. Only in extreme scenarios processing millions of paths in tight loops would the overhead matter; for everything else, choose pathlib for readability.
Conclusion
pathlib is the modern standard for working with filesystem paths in Python. The / operator for joining, properties like .stem and .suffix for components, and built-in methods like read_text() and glob() make your code shorter and more expressive. Every os.path pattern has a cleaner pathlib equivalent, and since Python 3.6 Path objects work with virtually every function that accepts file paths.
Start by replacing os.path.join() calls in your existing code with the / operator. Then adopt read_text() and write_text() for simple file I/O. Before long, you will never reach for os.path again.