Python Dataclasses: The Complete Guide for 2026
Python dataclasses, introduced in Python 3.7 via PEP 557, eliminate the boilerplate of writing __init__, __repr__, and __eq__ methods for classes that primarily store data. You annotate your fields with types, apply the @dataclass decorator, and Python generates the rest. They are part of the standard library, require zero dependencies, and work seamlessly with type checkers like mypy and pyright.
This guide covers everything from basic usage to advanced patterns: the field() function, __post_init__, frozen and slotted dataclasses, inheritance, serialization, comparison with alternatives, and real-world best practices. All examples target Python 3.10+ unless noted otherwise.
Table of Contents
- What Are Dataclasses and Why They Exist
- @dataclass Decorator Basics
- Field Types and Default Values
- The field() Function
- The __post_init__ Method
- Inheritance with Dataclasses
- Frozen Dataclasses (Immutable)
- Dataclass vs NamedTuple vs TypedDict
- Dataclasses with __slots__
- match_args and kw_only (Python 3.10+)
- Serialization (asdict, astuple, JSON)
- Validation Patterns
- Dataclasses vs Pydantic vs attrs
- Real-World Patterns and Best Practices
- FAQ
1. What Are Dataclasses and Why They Exist
Before dataclasses, creating a simple data-holding class meant repeating every field in __init__, __repr__, and __eq__. Dataclasses generate those methods from annotated class variables:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float = 0.0
# __init__, __repr__, and __eq__ are generated automatically
p = Point(1.0, 2.0)
print(p) # Point(x=1.0, y=2.0, z=0.0)
print(p == Point(1.0, 2.0)) # True
Dataclasses are in the standard library (from dataclasses import dataclass), work with every type checker, and add zero runtime overhead beyond normal class instantiation. They are the right choice whenever you need a structured data container and do not need runtime validation.
2. @dataclass Decorator Basics
The @dataclass decorator accepts several parameters that control which methods are generated:
from dataclasses import dataclass
@dataclass(
init=True, # Generate __init__ (default: True)
repr=True, # Generate __repr__ (default: True)
eq=True, # Generate __eq__ and __ne__ (default: True)
order=False, # Generate __lt__, __le__, __gt__, __ge__ (default: False)
unsafe_hash=False, # Generate __hash__ (default: False)
frozen=False, # Make instances immutable (default: False)
match_args=True, # Generate __match_args__ for pattern matching (3.10+)
kw_only=False, # All fields keyword-only (3.10+)
slots=False, # Generate __slots__ (3.10+)
)
class Config:
host: str
port: int
debug: bool = False
The most common combinations:
# Simple data container (default)
@dataclass
class User:
name: str
age: int
# Sortable dataclass (enables <, <=, >, >=)
@dataclass(order=True)
class Version:
major: int
minor: int
patch: int
v1 = Version(1, 2, 3)
v2 = Version(2, 0, 0)
print(v1 < v2) # True
print(sorted([v2, v1])) # [Version(1, 2, 3), Version(2, 0, 0)]
# Immutable + hashable
@dataclass(frozen=True)
class Color:
r: int
g: int
b: int
colors = {Color(255, 0, 0): "red", Color(0, 255, 0): "green"}
print(colors[Color(255, 0, 0)]) # "red"
When order=True, comparison uses the tuple of all fields in the order they are defined. Fields listed first have higher comparison priority.
3. Field Types and Default Values
Fields without defaults must come before fields with defaults, just like function arguments:
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime
@dataclass
class Article:
title: str # Required (no default)
author: str # Required
published: bool = False # Default value
views: int = 0
tags: list[str] = field(default_factory=list) # Mutable default
created_at: Optional[datetime] = None
Warning: mutable default values. Never use a mutable object (list, dict, set) as a default value directly — every instance would share the same object. Python raises a ValueError if you try. Always use field(default_factory=list), field(default_factory=dict), or field(default_factory=set) instead.
4. The field() Function
The field() function gives fine-grained control over individual fields. It accepts several parameters that affect initialization, representation, comparison, and hashing:
from dataclasses import dataclass, field
from typing import Any
@dataclass
class Record:
# default_factory: callable that produces the default value
items: list[str] = field(default_factory=list)
# repr=False: hide from __repr__ output
_cache: dict = field(default_factory=dict, repr=False)
# compare=False: exclude from __eq__ and ordering
debug_id: int = field(default=0, compare=False)
# hash=False: exclude from __hash__ (when frozen=True or unsafe_hash=True)
mutable_data: list = field(default_factory=list, hash=False)
# init=False: not part of __init__, set in __post_init__ or directly
computed: str = field(init=False, default="")
# kw_only=True: must be passed as keyword argument (3.10+)
verbose: bool = field(default=False, kw_only=True)
# metadata: arbitrary info (not used by dataclasses itself)
name: str = field(default="", metadata={"max_length": 100, "db_column": "record_name"})
A common pattern uses field(init=False) with __post_init__ for computed values:
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
perimeter: float = field(init=False)
def __post_init__(self):
self.area = self.width * self.height
self.perimeter = 2 * (self.width + self.height)
r = Rectangle(5.0, 3.0)
print(r.area) # 15.0
print(r.perimeter) # 16.0
print(r) # Rectangle(width=5.0, height=3.0, area=15.0, perimeter=16.0)
5. The __post_init__ Method
__post_init__ runs immediately after the generated __init__ completes. Use it for validation, computed fields, type coercion, and any logic that depends on multiple fields:
from dataclasses import dataclass, field
@dataclass
class DateRange:
start: str
end: str
days: int = field(init=False)
def __post_init__(self):
from datetime import datetime
start_dt = datetime.fromisoformat(self.start)
end_dt = datetime.fromisoformat(self.end)
if end_dt <= start_dt:
raise ValueError(f"end ({self.end}) must be after start ({self.start})")
self.days = (end_dt - start_dt).days
dr = DateRange("2026-01-01", "2026-03-01")
print(dr.days) # 59
When using inheritance, __post_init__ works with InitVar fields — parameters that are passed to __init__ but not stored as fields:
from dataclasses import dataclass, field, InitVar
import hashlib
@dataclass
class User:
username: str
email: str
raw_password: InitVar[str] # Passed to __init__ but not stored
password_hash: str = field(init=False)
def __post_init__(self, raw_password: str):
# raw_password is available here but not as self.raw_password
self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()
user = User("alice", "alice@example.com", "secret123")
print(user.password_hash) # sha256 hash
# print(user.raw_password) # AttributeError: no such attribute
InitVar is powerful for fields that should influence initialization but not be part of the stored state.
6. Inheritance with Dataclasses
Dataclasses support inheritance. Child fields are appended after parent fields. The generated __init__ includes all fields from the inheritance chain:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Base:
id: int
created_at: datetime = field(default_factory=datetime.now)
@dataclass
class User(Base):
name: str = ""
email: str = ""
@dataclass
class Admin(User):
permissions: list[str] = field(default_factory=list)
admin = Admin(id=1, name="Alice", email="alice@corp.com", permissions=["read", "write"])
print(admin)
# Admin(id=1, created_at=datetime(...), name='Alice', email='alice@corp.com',
# permissions=['read', 'write'])
Default value ordering caveat: If a parent dataclass has fields with defaults, all child fields must also have defaults. Otherwise Python raises a TypeError:
@dataclass
class Parent:
x: int = 0 # Has default
# WRONG: required field after field with default
# @dataclass
# class Child(Parent):
# y: int # TypeError: non-default argument 'y' follows default argument
# FIX: Use kw_only (Python 3.10+)
@dataclass(kw_only=True)
class Child(Parent):
y: int # Works: all fields are keyword-only
c = Child(x=1, y=2) # Must use keyword arguments
The kw_only=True parameter (Python 3.10+) solves the default ordering problem by making all fields keyword-only, removing the positional argument constraint.
7. Frozen Dataclasses (Immutable)
Frozen dataclasses prevent field modification after creation. Any assignment raises FrozenInstanceError. Frozen instances are automatically hashable, so they work as dictionary keys and set members:
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
lat: float
lon: float
c = Coordinate(40.7128, -74.0060)
# c.lat = 41.0 # FrozenInstanceError
# Frozen dataclasses are hashable - use as dict keys or in sets
locations = {Coordinate(40.7128, -74.0060): "New York", Coordinate(51.5074, -0.1278): "London"}
print(locations[Coordinate(40.7128, -74.0060)]) # "New York"
To "modify" a frozen dataclass, use dataclasses.replace() which creates a new instance with updated fields:
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Config:
host: str
port: int
debug: bool = False
prod = Config(host="api.example.com", port=443)
dev = replace(prod, host="localhost", port=8000, debug=True)
print(prod) # Config(host='api.example.com', port=443, debug=False)
print(dev) # Config(host='localhost', port=8000, debug=True)
Frozen dataclasses are ideal for configuration objects, value objects in domain-driven design, cache keys, and any data that should be treated as a value rather than an entity.
8. Dataclass vs NamedTuple vs TypedDict
Python offers three main ways to create typed data containers:
from dataclasses import dataclass
from typing import NamedTuple, TypedDict
@dataclass
class UserDC: # Mutable class, most flexible
name: str
age: int
class UserNT(NamedTuple): # Immutable tuple, supports indexing/unpacking
name: str
age: int
class UserTD(TypedDict): # Typed dict, for JSON-like data
name: str
age: int
dc = UserDC("Alice", 30); dc.age = 31 # Mutable
nt = UserNT("Alice", 30); name, age = nt # Unpackable, immutable
td: UserTD = {"name": "Alice", "age": 30} # Dict access
Key differences: Dataclasses are mutable (unless frozen), hashable only when frozen, support full inheritance and methods, and have __slots__ (3.10+). NamedTuples are always immutable, hashable, indexable, and memory-efficient but have limited inheritance. TypedDicts are mutable dicts with type checking but cannot have methods and are not hashable.
When to use each: Use dataclasses for general-purpose data containers. Use NamedTuple for lightweight immutable records with tuple compatibility. Use TypedDict when working with JSON data or dictionaries that need type checking.
9. Dataclasses with __slots__
Python 3.10 added slots=True to the @dataclass decorator. This generates a __slots__ attribute, storing fields in a compact array instead of a per-instance dictionary:
from dataclasses import dataclass
import sys
@dataclass
class RegularPoint:
x: float
y: float
z: float
@dataclass(slots=True)
class SlottedPoint:
x: float
y: float
z: float
regular = RegularPoint(1.0, 2.0, 3.0)
slotted = SlottedPoint(1.0, 2.0, 3.0)
print(sys.getsizeof(regular)) # ~152 bytes (has __dict__)
print(sys.getsizeof(slotted)) # ~56 bytes (no __dict__)
# slotted.w = 4.0 # AttributeError: cannot add arbitrary attributes
Benefits: 30–40% less memory per instance, faster attribute access (direct offset vs dict lookup), and faster instantiation. The tradeoff is that you cannot dynamically add attributes. Combine with frozen=True for maximum efficiency on bulk data:
@dataclass(frozen=True, slots=True)
class ImmutableRecord:
id: int
name: str
value: float
records = [ImmutableRecord(i, f"item_{i}", i * 1.5) for i in range(100_000)]
10. match_args and kw_only (Python 3.10+)
Python 3.10 added two important parameters to the @dataclass decorator:
match_args: Structural Pattern Matching
match_args=True (the default in 3.10+) generates a __match_args__ tuple that enables structural pattern matching with the match/case statement:
from dataclasses import dataclass
@dataclass
class Command:
action: str
target: str
value: str = ""
def handle(cmd: Command):
match cmd:
case Command("create", target):
print(f"Creating {target}")
case Command("delete", target) if target != "admin":
print(f"Deleting {target}")
case Command("update", target, value):
print(f"Updating {target} to {value}")
case _:
print(f"Unknown command: {cmd}")
handle(Command("create", "user")) # Creating user
handle(Command("update", "name", "Bob")) # Updating name to Bob
kw_only: Keyword-Only Fields
kw_only=True makes all fields keyword-only, improving readability and solving the inheritance default-ordering problem:
from dataclasses import dataclass, field
@dataclass(kw_only=True)
class Connection:
host: str
port: int
timeout: float = 30.0
conn = Connection(host="db.example.com", port=5432)
# conn = Connection("db.example.com", 5432) # TypeError: keyword-only
# Per-field kw_only: mix positional and keyword-only arguments
@dataclass
class Request:
method: str # Positional
url: str # Positional
timeout: float = field(default=30.0, kw_only=True) # Keyword-only
headers: dict = field(default_factory=dict, kw_only=True)
req = Request("GET", "https://api.example.com", timeout=10.0)
11. Serialization (asdict, astuple, JSON)
The dataclasses module provides asdict() and astuple() for converting instances to dictionaries and tuples:
from dataclasses import dataclass, asdict, astuple, field
import json
@dataclass
class Address:
street: str
city: str
country: str = "US"
@dataclass
class Employee:
name: str
email: str
address: Address
tags: list[str] = field(default_factory=list)
emp = Employee("Alice", "alice@example.com", Address("123 Main St", "NY"), ["senior"])
d = asdict(emp) # Recursively converts nested dataclasses to dicts
t = astuple(emp) # Recursively converts to nested tuples
json_str = json.dumps(asdict(emp), indent=2) # JSON serialization
For types that json.dumps cannot handle (like datetime), use a custom encoder:
from dataclasses import dataclass, asdict
from datetime import datetime
import json
@dataclass
class Event:
name: str
timestamp: datetime
class DataclassEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)
event = Event("deploy", datetime(2026, 2, 12, 14, 30))
json_str = json.dumps(asdict(event), cls=DataclassEncoder, indent=2)
# {"name": "deploy", "timestamp": "2026-02-12T14:30:00"}
12. Validation Patterns
Dataclasses do not validate types at runtime. If you pass a string where an int is expected, it is silently accepted. Add validation in __post_init__:
from dataclasses import dataclass
@dataclass
class UserProfile:
username: str
age: int
email: str
def __post_init__(self):
if not isinstance(self.username, str) or len(self.username) < 3:
raise ValueError(f"Username must be 3+ chars, got: {self.username!r}")
if not isinstance(self.age, int) or not (0 <= self.age <= 150):
raise ValueError(f"Age must be 0-150, got: {self.age!r}")
if "@" not in self.email:
raise ValueError(f"Invalid email: {self.email!r}")
self.username = self.username.lower().strip()
For more robust validation, use pydantic.dataclasses as a drop-in replacement that adds full Pydantic validation while keeping dataclass syntax:
from pydantic.dataclasses import dataclass # Drop-in replacement
from pydantic import field_validator
@dataclass
class StrictUser:
name: str
age: int
email: str
@field_validator("age")
@classmethod
def validate_age(cls, v: int) -> int:
if v < 0 or v > 150:
raise ValueError("Age must be between 0 and 150")
return v
user = StrictUser(name="Alice", age="30", email="a@b.com") # Coerces "30" to 30
13. Dataclasses vs Pydantic vs attrs
| Feature | dataclasses | Pydantic | attrs |
|---|---|---|---|
| Stdlib | Yes (3.7+) | No | No |
| Runtime validation | Manual | Automatic | Optional |
| Type coercion | No | Yes | No |
| JSON serialization | asdict + json | Built-in (Rust) | Via cattrs |
| JSON Schema | No | Built-in | No |
| Performance | Fastest | Fast (Rust) | Fast (C) |
| Best for | Internal data | External data | Complex hierarchies |
Decision guide:
- Use dataclasses when your data is internal, types are trusted, and you want zero dependencies. Ideal for domain models, internal DTOs, and value objects.
- Use Pydantic when data crosses a trust boundary (API inputs, config files, user data). Validation and coercion matter more than raw performance.
- Use attrs when you need advanced features like validators on individual fields, factory functions, and complex class hierarchies with slotted classes before Python 3.10.
14. Real-World Patterns and Best Practices
Pattern 1: Builder pattern with replace(). Use dataclasses.replace() for immutable updates, creating a fluent builder:
from dataclasses import dataclass, replace
@dataclass(frozen=True)
class Query:
table: str
conditions: tuple[str, ...] = ()
limit: int | None = None
offset: int = 0
def where(self, condition: str) -> "Query":
return replace(self, conditions=self.conditions + (condition,))
def with_limit(self, n: int) -> "Query":
return replace(self, limit=n)
def with_offset(self, n: int) -> "Query":
return replace(self, offset=n)
query = (
Query("users")
.where("age > 18")
.where("active = true")
.with_limit(10)
.with_offset(20)
)
print(query)
# Query(table='users', conditions=('age > 18', 'active = true'), limit=10, offset=20)
Pattern 2: Registry with ClassVar. Use ClassVar for class-level data that is not a field:
from dataclasses import dataclass
from typing import ClassVar
@dataclass
class Plugin:
name: str
version: str
registry: ClassVar[dict[str, "Plugin"]] = {}
def __post_init__(self):
Plugin.registry[self.name] = self
Plugin("auth", "1.0.0")
print(Plugin.registry["auth"]) # Plugin(name='auth', version='1.0.0')
Pattern 3: Immutable event or message.
from dataclasses import dataclass, field
from datetime import datetime, timezone
from uuid import uuid4
@dataclass(frozen=True, slots=True)
class Event:
type: str
payload: dict
id: str = field(default_factory=lambda: str(uuid4()))
timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
event = Event("user.created", {"user_id": 42})
General best practices:
- Use
frozen=Trueby default unless you have a specific reason to mutate. Immutable objects are easier to reason about and safe to share across threads. - Use
slots=Trueon Python 3.10+ for better memory usage and faster attribute access. - Use
kw_only=Truefor dataclasses with more than 3–4 fields to improve readability at call sites. - Use
field(repr=False)to hide large or sensitive fields from__repr__output. - Use
field(compare=False)for metadata fields (timestamps, IDs) that should not affect equality. - Prefer
__post_init__over__init__override. If you override__init__, you lose all generated logic. - Use
ClassVarfor class-level data andInitVarfor init-only parameters. - For complex validation needs, consider Pydantic dataclasses as a drop-in replacement.
Frequently Asked Questions
What are Python dataclasses and why should I use them?
Python dataclasses (introduced in Python 3.7) are a decorator-based way to create classes that primarily store data. The @dataclass decorator automatically generates __init__, __repr__, and __eq__ methods based on your type-annotated fields. Use them to eliminate boilerplate when you need structured data containers with type hints, comparison support, and clean string representations without writing repetitive constructor code.
What is the difference between dataclasses and Pydantic?
Dataclasses are part of Python's standard library and generate boilerplate methods like __init__ and __repr__, but perform zero runtime validation. Pydantic validates and coerces every field at runtime, raises clear errors for invalid data, and provides built-in JSON serialization. Use dataclasses for internal data structures where you trust the types. Use Pydantic when data crosses a trust boundary such as API requests, config files, or user input.
How do I make a dataclass immutable (frozen)?
Use @dataclass(frozen=True) to make instances immutable. Frozen dataclasses raise FrozenInstanceError if you try to assign to any field after creation. They are also hashable by default, meaning you can use them as dictionary keys or in sets. Frozen dataclasses are ideal for configuration objects, value objects in domain-driven design, and any data that should not change after initialization.
What does __slots__ do in a dataclass?
Adding slots=True to @dataclass (Python 3.10+) generates a __slots__ attribute, which stores fields in a fixed-size array instead of a per-instance __dict__. This reduces memory usage by 30–40% and speeds up attribute access. The tradeoff is that you cannot add arbitrary attributes to instances at runtime. Use slots=True for classes you create many instances of, such as records in data processing pipelines.
Can I add validation to dataclasses?
Yes, use the __post_init__ method to add validation logic that runs immediately after __init__. Inside __post_init__, check field values and raise ValueError or TypeError for invalid data. For more advanced validation, you can use pydantic.dataclasses.dataclass as a drop-in replacement that adds full Pydantic validation to dataclass syntax. You can also use third-party libraries like beartype for runtime type checking.
Conclusion
Python dataclasses are the standard tool for creating data containers in Python. They eliminate constructor and comparison boilerplate, integrate perfectly with type checkers, and require zero external dependencies. With Python 3.10+ features like slots=True, kw_only=True, and match_args, they are more powerful than ever.
Start with @dataclass(frozen=True, slots=True) for most new classes. Add kw_only=True when you have many fields. Use __post_init__ for validation and computed fields. When you need runtime validation for untrusted data, reach for Pydantic — but for internal data structures, dataclasses are all you need.