Python Dataclasses: The Complete Guide for 2026

February 12, 202620 min read

Python dataclasses, introduced in Python 3.7 via PEP 557, eliminate the boilerplate of writing __init__, __repr__, and __eq__ methods for classes that primarily store data. You annotate your fields with types, apply the @dataclass decorator, and Python generates the rest. They are part of the standard library, require zero dependencies, and work seamlessly with type checkers like mypy and pyright.

This guide covers everything from basic usage to advanced patterns: the field() function, __post_init__, frozen and slotted dataclasses, inheritance, serialization, comparison with alternatives, and real-world best practices. All examples target Python 3.10+ unless noted otherwise.

⚙ Related resources: Learn about runtime validation with Pydantic, master Python Type Hints, and set up isolated environments with our Python Virtual Environments Guide.

1. What Are Dataclasses and Why They Exist

Before dataclasses, creating a simple data-holding class meant repeating every field in __init__, __repr__, and __eq__. Dataclasses generate those methods from annotated class variables:

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0

# __init__, __repr__, and __eq__ are generated automatically
p = Point(1.0, 2.0)
print(p)         # Point(x=1.0, y=2.0, z=0.0)
print(p == Point(1.0, 2.0))  # True

Dataclasses are in the standard library (from dataclasses import dataclass), work with every type checker, and add zero runtime overhead beyond normal class instantiation. They are the right choice whenever you need a structured data container and do not need runtime validation.

2. @dataclass Decorator Basics

The @dataclass decorator accepts several parameters that control which methods are generated:

from dataclasses import dataclass

@dataclass(
    init=True,       # Generate __init__ (default: True)
    repr=True,       # Generate __repr__ (default: True)
    eq=True,         # Generate __eq__ and __ne__ (default: True)
    order=False,     # Generate __lt__, __le__, __gt__, __ge__ (default: False)
    unsafe_hash=False,  # Generate __hash__ (default: False)
    frozen=False,    # Make instances immutable (default: False)
    match_args=True, # Generate __match_args__ for pattern matching (3.10+)
    kw_only=False,   # All fields keyword-only (3.10+)
    slots=False,     # Generate __slots__ (3.10+)
)
class Config:
    host: str
    port: int
    debug: bool = False

The most common combinations:

# Simple data container (default)
@dataclass
class User:
    name: str
    age: int

# Sortable dataclass (enables <, <=, >, >=)
@dataclass(order=True)
class Version:
    major: int
    minor: int
    patch: int

v1 = Version(1, 2, 3)
v2 = Version(2, 0, 0)
print(v1 < v2)       # True
print(sorted([v2, v1]))  # [Version(1, 2, 3), Version(2, 0, 0)]

# Immutable + hashable
@dataclass(frozen=True)
class Color:
    r: int
    g: int
    b: int

colors = {Color(255, 0, 0): "red", Color(0, 255, 0): "green"}
print(colors[Color(255, 0, 0)])  # "red"

When order=True, comparison uses the tuple of all fields in the order they are defined. Fields listed first have higher comparison priority.

3. Field Types and Default Values

Fields without defaults must come before fields with defaults, just like function arguments:

from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime

@dataclass
class Article:
    title: str                               # Required (no default)
    author: str                              # Required
    published: bool = False                  # Default value
    views: int = 0
    tags: list[str] = field(default_factory=list)  # Mutable default
    created_at: Optional[datetime] = None

Warning: mutable default values. Never use a mutable object (list, dict, set) as a default value directly — every instance would share the same object. Python raises a ValueError if you try. Always use field(default_factory=list), field(default_factory=dict), or field(default_factory=set) instead.

4. The field() Function

The field() function gives fine-grained control over individual fields. It accepts several parameters that affect initialization, representation, comparison, and hashing:

from dataclasses import dataclass, field
from typing import Any

@dataclass
class Record:
    # default_factory: callable that produces the default value
    items: list[str] = field(default_factory=list)

    # repr=False: hide from __repr__ output
    _cache: dict = field(default_factory=dict, repr=False)

    # compare=False: exclude from __eq__ and ordering
    debug_id: int = field(default=0, compare=False)

    # hash=False: exclude from __hash__ (when frozen=True or unsafe_hash=True)
    mutable_data: list = field(default_factory=list, hash=False)

    # init=False: not part of __init__, set in __post_init__ or directly
    computed: str = field(init=False, default="")

    # kw_only=True: must be passed as keyword argument (3.10+)
    verbose: bool = field(default=False, kw_only=True)

    # metadata: arbitrary info (not used by dataclasses itself)
    name: str = field(default="", metadata={"max_length": 100, "db_column": "record_name"})

A common pattern uses field(init=False) with __post_init__ for computed values:

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)
    perimeter: float = field(init=False)

    def __post_init__(self):
        self.area = self.width * self.height
        self.perimeter = 2 * (self.width + self.height)

r = Rectangle(5.0, 3.0)
print(r.area)       # 15.0
print(r.perimeter)  # 16.0
print(r)  # Rectangle(width=5.0, height=3.0, area=15.0, perimeter=16.0)

5. The __post_init__ Method

__post_init__ runs immediately after the generated __init__ completes. Use it for validation, computed fields, type coercion, and any logic that depends on multiple fields:

from dataclasses import dataclass, field

@dataclass
class DateRange:
    start: str
    end: str
    days: int = field(init=False)

    def __post_init__(self):
        from datetime import datetime
        start_dt = datetime.fromisoformat(self.start)
        end_dt = datetime.fromisoformat(self.end)

        if end_dt <= start_dt:
            raise ValueError(f"end ({self.end}) must be after start ({self.start})")

        self.days = (end_dt - start_dt).days

dr = DateRange("2026-01-01", "2026-03-01")
print(dr.days)  # 59

When using inheritance, __post_init__ works with InitVar fields — parameters that are passed to __init__ but not stored as fields:

from dataclasses import dataclass, field, InitVar
import hashlib

@dataclass
class User:
    username: str
    email: str
    raw_password: InitVar[str]  # Passed to __init__ but not stored
    password_hash: str = field(init=False)

    def __post_init__(self, raw_password: str):
        # raw_password is available here but not as self.raw_password
        self.password_hash = hashlib.sha256(raw_password.encode()).hexdigest()

user = User("alice", "alice@example.com", "secret123")
print(user.password_hash)  # sha256 hash
# print(user.raw_password)  # AttributeError: no such attribute

InitVar is powerful for fields that should influence initialization but not be part of the stored state.

6. Inheritance with Dataclasses

Dataclasses support inheritance. Child fields are appended after parent fields. The generated __init__ includes all fields from the inheritance chain:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Base:
    id: int
    created_at: datetime = field(default_factory=datetime.now)

@dataclass
class User(Base):
    name: str = ""
    email: str = ""

@dataclass
class Admin(User):
    permissions: list[str] = field(default_factory=list)

admin = Admin(id=1, name="Alice", email="alice@corp.com", permissions=["read", "write"])
print(admin)
# Admin(id=1, created_at=datetime(...), name='Alice', email='alice@corp.com',
#       permissions=['read', 'write'])

Default value ordering caveat: If a parent dataclass has fields with defaults, all child fields must also have defaults. Otherwise Python raises a TypeError:

@dataclass
class Parent:
    x: int = 0  # Has default

# WRONG: required field after field with default
# @dataclass
# class Child(Parent):
#     y: int  # TypeError: non-default argument 'y' follows default argument

# FIX: Use kw_only (Python 3.10+)
@dataclass(kw_only=True)
class Child(Parent):
    y: int  # Works: all fields are keyword-only

c = Child(x=1, y=2)  # Must use keyword arguments

The kw_only=True parameter (Python 3.10+) solves the default ordering problem by making all fields keyword-only, removing the positional argument constraint.

7. Frozen Dataclasses (Immutable)

Frozen dataclasses prevent field modification after creation. Any assignment raises FrozenInstanceError. Frozen instances are automatically hashable, so they work as dictionary keys and set members:

from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    lat: float
    lon: float

c = Coordinate(40.7128, -74.0060)
# c.lat = 41.0  # FrozenInstanceError

# Frozen dataclasses are hashable - use as dict keys or in sets
locations = {Coordinate(40.7128, -74.0060): "New York", Coordinate(51.5074, -0.1278): "London"}
print(locations[Coordinate(40.7128, -74.0060)])  # "New York"

To "modify" a frozen dataclass, use dataclasses.replace() which creates a new instance with updated fields:

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Config:
    host: str
    port: int
    debug: bool = False

prod = Config(host="api.example.com", port=443)
dev = replace(prod, host="localhost", port=8000, debug=True)

print(prod)  # Config(host='api.example.com', port=443, debug=False)
print(dev)   # Config(host='localhost', port=8000, debug=True)

Frozen dataclasses are ideal for configuration objects, value objects in domain-driven design, cache keys, and any data that should be treated as a value rather than an entity.

8. Dataclass vs NamedTuple vs TypedDict

Python offers three main ways to create typed data containers:

from dataclasses import dataclass
from typing import NamedTuple, TypedDict

@dataclass
class UserDC:             # Mutable class, most flexible
    name: str
    age: int

class UserNT(NamedTuple): # Immutable tuple, supports indexing/unpacking
    name: str
    age: int

class UserTD(TypedDict):  # Typed dict, for JSON-like data
    name: str
    age: int

dc = UserDC("Alice", 30); dc.age = 31      # Mutable
nt = UserNT("Alice", 30); name, age = nt    # Unpackable, immutable
td: UserTD = {"name": "Alice", "age": 30}   # Dict access

Key differences: Dataclasses are mutable (unless frozen), hashable only when frozen, support full inheritance and methods, and have __slots__ (3.10+). NamedTuples are always immutable, hashable, indexable, and memory-efficient but have limited inheritance. TypedDicts are mutable dicts with type checking but cannot have methods and are not hashable.

When to use each: Use dataclasses for general-purpose data containers. Use NamedTuple for lightweight immutable records with tuple compatibility. Use TypedDict when working with JSON data or dictionaries that need type checking.

9. Dataclasses with __slots__

Python 3.10 added slots=True to the @dataclass decorator. This generates a __slots__ attribute, storing fields in a compact array instead of a per-instance dictionary:

from dataclasses import dataclass
import sys

@dataclass
class RegularPoint:
    x: float
    y: float
    z: float

@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float
    z: float

regular = RegularPoint(1.0, 2.0, 3.0)
slotted = SlottedPoint(1.0, 2.0, 3.0)
print(sys.getsizeof(regular))  # ~152 bytes (has __dict__)
print(sys.getsizeof(slotted))  # ~56 bytes  (no __dict__)
# slotted.w = 4.0  # AttributeError: cannot add arbitrary attributes

Benefits: 30–40% less memory per instance, faster attribute access (direct offset vs dict lookup), and faster instantiation. The tradeoff is that you cannot dynamically add attributes. Combine with frozen=True for maximum efficiency on bulk data:

@dataclass(frozen=True, slots=True)
class ImmutableRecord:
    id: int
    name: str
    value: float

records = [ImmutableRecord(i, f"item_{i}", i * 1.5) for i in range(100_000)]

10. match_args and kw_only (Python 3.10+)

Python 3.10 added two important parameters to the @dataclass decorator:

match_args: Structural Pattern Matching

match_args=True (the default in 3.10+) generates a __match_args__ tuple that enables structural pattern matching with the match/case statement:

from dataclasses import dataclass

@dataclass
class Command:
    action: str
    target: str
    value: str = ""

def handle(cmd: Command):
    match cmd:
        case Command("create", target):
            print(f"Creating {target}")
        case Command("delete", target) if target != "admin":
            print(f"Deleting {target}")
        case Command("update", target, value):
            print(f"Updating {target} to {value}")
        case _:
            print(f"Unknown command: {cmd}")

handle(Command("create", "user"))       # Creating user
handle(Command("update", "name", "Bob"))  # Updating name to Bob

kw_only: Keyword-Only Fields

kw_only=True makes all fields keyword-only, improving readability and solving the inheritance default-ordering problem:

from dataclasses import dataclass, field

@dataclass(kw_only=True)
class Connection:
    host: str
    port: int
    timeout: float = 30.0

conn = Connection(host="db.example.com", port=5432)
# conn = Connection("db.example.com", 5432)  # TypeError: keyword-only

# Per-field kw_only: mix positional and keyword-only arguments
@dataclass
class Request:
    method: str                                            # Positional
    url: str                                               # Positional
    timeout: float = field(default=30.0, kw_only=True)     # Keyword-only
    headers: dict = field(default_factory=dict, kw_only=True)

req = Request("GET", "https://api.example.com", timeout=10.0)

11. Serialization (asdict, astuple, JSON)

The dataclasses module provides asdict() and astuple() for converting instances to dictionaries and tuples:

from dataclasses import dataclass, asdict, astuple, field
import json

@dataclass
class Address:
    street: str
    city: str
    country: str = "US"

@dataclass
class Employee:
    name: str
    email: str
    address: Address
    tags: list[str] = field(default_factory=list)

emp = Employee("Alice", "alice@example.com", Address("123 Main St", "NY"), ["senior"])

d = asdict(emp)   # Recursively converts nested dataclasses to dicts
t = astuple(emp)  # Recursively converts to nested tuples
json_str = json.dumps(asdict(emp), indent=2)  # JSON serialization

For types that json.dumps cannot handle (like datetime), use a custom encoder:

from dataclasses import dataclass, asdict
from datetime import datetime
import json

@dataclass
class Event:
    name: str
    timestamp: datetime

class DataclassEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        return super().default(obj)

event = Event("deploy", datetime(2026, 2, 12, 14, 30))
json_str = json.dumps(asdict(event), cls=DataclassEncoder, indent=2)
# {"name": "deploy", "timestamp": "2026-02-12T14:30:00"}

12. Validation Patterns

Dataclasses do not validate types at runtime. If you pass a string where an int is expected, it is silently accepted. Add validation in __post_init__:

from dataclasses import dataclass

@dataclass
class UserProfile:
    username: str
    age: int
    email: str

    def __post_init__(self):
        if not isinstance(self.username, str) or len(self.username) < 3:
            raise ValueError(f"Username must be 3+ chars, got: {self.username!r}")
        if not isinstance(self.age, int) or not (0 <= self.age <= 150):
            raise ValueError(f"Age must be 0-150, got: {self.age!r}")
        if "@" not in self.email:
            raise ValueError(f"Invalid email: {self.email!r}")
        self.username = self.username.lower().strip()

For more robust validation, use pydantic.dataclasses as a drop-in replacement that adds full Pydantic validation while keeping dataclass syntax:

from pydantic.dataclasses import dataclass  # Drop-in replacement
from pydantic import field_validator

@dataclass
class StrictUser:
    name: str
    age: int
    email: str

    @field_validator("age")
    @classmethod
    def validate_age(cls, v: int) -> int:
        if v < 0 or v > 150:
            raise ValueError("Age must be between 0 and 150")
        return v

user = StrictUser(name="Alice", age="30", email="a@b.com")  # Coerces "30" to 30

13. Dataclasses vs Pydantic vs attrs

Feature dataclasses Pydantic attrs
StdlibYes (3.7+)NoNo
Runtime validationManualAutomaticOptional
Type coercionNoYesNo
JSON serializationasdict + jsonBuilt-in (Rust)Via cattrs
JSON SchemaNoBuilt-inNo
PerformanceFastestFast (Rust)Fast (C)
Best forInternal dataExternal dataComplex hierarchies

Decision guide:

14. Real-World Patterns and Best Practices

Pattern 1: Builder pattern with replace(). Use dataclasses.replace() for immutable updates, creating a fluent builder:

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class Query:
    table: str
    conditions: tuple[str, ...] = ()
    limit: int | None = None
    offset: int = 0

    def where(self, condition: str) -> "Query":
        return replace(self, conditions=self.conditions + (condition,))

    def with_limit(self, n: int) -> "Query":
        return replace(self, limit=n)

    def with_offset(self, n: int) -> "Query":
        return replace(self, offset=n)

query = (
    Query("users")
    .where("age > 18")
    .where("active = true")
    .with_limit(10)
    .with_offset(20)
)
print(query)
# Query(table='users', conditions=('age > 18', 'active = true'), limit=10, offset=20)

Pattern 2: Registry with ClassVar. Use ClassVar for class-level data that is not a field:

from dataclasses import dataclass
from typing import ClassVar

@dataclass
class Plugin:
    name: str
    version: str
    registry: ClassVar[dict[str, "Plugin"]] = {}

    def __post_init__(self):
        Plugin.registry[self.name] = self

Plugin("auth", "1.0.0")
print(Plugin.registry["auth"])  # Plugin(name='auth', version='1.0.0')

Pattern 3: Immutable event or message.

from dataclasses import dataclass, field
from datetime import datetime, timezone
from uuid import uuid4

@dataclass(frozen=True, slots=True)
class Event:
    type: str
    payload: dict
    id: str = field(default_factory=lambda: str(uuid4()))
    timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))

event = Event("user.created", {"user_id": 42})

General best practices:

Frequently Asked Questions

What are Python dataclasses and why should I use them?

Python dataclasses (introduced in Python 3.7) are a decorator-based way to create classes that primarily store data. The @dataclass decorator automatically generates __init__, __repr__, and __eq__ methods based on your type-annotated fields. Use them to eliminate boilerplate when you need structured data containers with type hints, comparison support, and clean string representations without writing repetitive constructor code.

What is the difference between dataclasses and Pydantic?

Dataclasses are part of Python's standard library and generate boilerplate methods like __init__ and __repr__, but perform zero runtime validation. Pydantic validates and coerces every field at runtime, raises clear errors for invalid data, and provides built-in JSON serialization. Use dataclasses for internal data structures where you trust the types. Use Pydantic when data crosses a trust boundary such as API requests, config files, or user input.

How do I make a dataclass immutable (frozen)?

Use @dataclass(frozen=True) to make instances immutable. Frozen dataclasses raise FrozenInstanceError if you try to assign to any field after creation. They are also hashable by default, meaning you can use them as dictionary keys or in sets. Frozen dataclasses are ideal for configuration objects, value objects in domain-driven design, and any data that should not change after initialization.

What does __slots__ do in a dataclass?

Adding slots=True to @dataclass (Python 3.10+) generates a __slots__ attribute, which stores fields in a fixed-size array instead of a per-instance __dict__. This reduces memory usage by 30–40% and speeds up attribute access. The tradeoff is that you cannot add arbitrary attributes to instances at runtime. Use slots=True for classes you create many instances of, such as records in data processing pipelines.

Can I add validation to dataclasses?

Yes, use the __post_init__ method to add validation logic that runs immediately after __init__. Inside __post_init__, check field values and raise ValueError or TypeError for invalid data. For more advanced validation, you can use pydantic.dataclasses.dataclass as a drop-in replacement that adds full Pydantic validation to dataclass syntax. You can also use third-party libraries like beartype for runtime type checking.

Conclusion

Python dataclasses are the standard tool for creating data containers in Python. They eliminate constructor and comparison boilerplate, integrate perfectly with type checkers, and require zero external dependencies. With Python 3.10+ features like slots=True, kw_only=True, and match_args, they are more powerful than ever.

Start with @dataclass(frozen=True, slots=True) for most new classes. Add kw_only=True when you have many fields. Use __post_init__ for validation and computed fields. When you need runtime validation for untrusted data, reach for Pydantic — but for internal data structures, dataclasses are all you need.

⚙ Essential tools: Format your JSON with the JSON Formatter, validate Python code with the Python Formatter, and explore type hints in our Python Type Hints Guide.

Related Resources

Pydantic Complete Guide
Runtime validation and serialization for Python data models
Python Type Hints Guide
Master type annotations that power dataclasses and Pydantic
Python Virtual Environments Guide
Set up isolated Python environments for your projects
Python Testing with Pytest
Test your dataclasses and data models thoroughly