SQLAlchemy Complete Guide: ORM, Models, Queries & Best Practices
SQLAlchemy is the most widely used database toolkit for Python. It provides two distinct layers for working with databases: a low-level Core layer for constructing SQL expressions in Python, and a high-level ORM layer that maps database rows to Python objects. Whether you are building a FastAPI application, a Flask web service, a CLI tool, or a data pipeline, SQLAlchemy gives you the tools to interact with relational databases in a Pythonic, type-safe, and performant way.
This guide covers everything from installing SQLAlchemy and creating your first engine to advanced patterns like async sessions, hybrid properties, event listeners, and the SQLAlchemy 2.0 style. Every section includes complete, working code examples that you can copy directly into your project.
Table of Contents
- What Is SQLAlchemy — Core vs ORM
- Installation & Engine Setup
- Declarative Models
- CRUD Operations
- Querying — select, filter, join, aggregate
- Relationships
- Migrations with Alembic
- Advanced Patterns
- Async SQLAlchemy
- Performance Optimization
- SQLAlchemy 2.0 Style
- Testing with SQLAlchemy
- Frequently Asked Questions
1. What Is SQLAlchemy — Core vs ORM
SQLAlchemy is a Python SQL toolkit and Object-Relational Mapper created by Mike Bayer. Unlike ORMs that try to hide SQL completely, SQLAlchemy embraces the relational model and gives you two complementary ways to interact with databases:
- SQLAlchemy Core — a SQL expression language that lets you construct SQL statements using Python objects. You work with
Table,Column,select(),insert(), andjoin()objects. Core gives you full control over the generated SQL and is ideal for bulk operations, reporting, and complex queries where ORM overhead is unnecessary. - SQLAlchemy ORM — a higher-level layer built on Core that maps database rows to Python class instances. The ORM manages object identity (each row is a single Python object), tracks changes automatically (dirty checking), and handles relationships between tables. This is what most developers use for application logic.
The two layers are not mutually exclusive. You can use ORM models for your application logic and drop down to Core expressions for performance-critical queries, all within the same codebase and session.
2. Installation & Engine Setup
Install SQLAlchemy and a database driver. The driver you need depends on your database:
# SQLAlchemy core package
pip install sqlalchemy
# Database drivers
pip install psycopg2-binary # PostgreSQL (sync)
pip install asyncpg # PostgreSQL (async)
pip install pymysql # MySQL
pip install aiosqlite # SQLite (async)
# With async support
pip install sqlalchemy[asyncio]
The engine is the starting point for all SQLAlchemy operations. It manages a connection pool and provides connections to the database:
from sqlalchemy import create_engine
# SQLite (file-based)
engine = create_engine("sqlite:///app.db")
# SQLite (in-memory, great for testing)
engine = create_engine("sqlite:///:memory:")
# PostgreSQL
engine = create_engine("postgresql://user:password@localhost:5432/mydb")
# MySQL
engine = create_engine("mysql+pymysql://user:password@localhost:3306/mydb")
# Production settings with connection pool tuning
engine = create_engine(
"postgresql://user:password@localhost:5432/mydb",
echo=False, # Set True to log all SQL statements
pool_size=10, # Number of persistent connections
max_overflow=20, # Extra connections when pool is full
pool_timeout=30, # Seconds to wait for a connection
pool_recycle=1800, # Recycle connections after 30 minutes
pool_pre_ping=True, # Test connections before using them
)
The echo=True parameter is invaluable during development. It logs every SQL statement that SQLAlchemy generates, helping you understand and optimize your queries.
3. Declarative Models
In SQLAlchemy 2.0, models are defined using DeclarativeBase and Mapped type annotations. Each class maps to a database table, and each attribute maps to a column:
from datetime import datetime
from typing import Optional
from sqlalchemy import String, Text, ForeignKey, func
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(String(50), unique=True, index=True)
email: Mapped[str] = mapped_column(String(255), unique=True)
bio: Mapped[Optional[str]] = mapped_column(Text, default=None)
is_active: Mapped[bool] = mapped_column(default=True)
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
# Relationships
posts: Mapped[list["Post"]] = relationship(back_populates="author")
def __repr__(self) -> str:
return f"User(id={self.id}, username={self.username!r})"
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(300))
slug: Mapped[str] = mapped_column(String(300), unique=True, index=True)
body: Mapped[str] = mapped_column(Text)
status: Mapped[str] = mapped_column(String(20), default="draft")
views: Mapped[int] = mapped_column(default=0)
author_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
# Relationships
author: Mapped["User"] = relationship(back_populates="posts")
tags: Mapped[list["Tag"]] = relationship(
secondary="post_tags", back_populates="posts"
)
def __repr__(self) -> str:
return f"Post(id={self.id}, title={self.title!r})"
Key points about the 2.0 declarative style:
Mapped[str]means the column is NOT NULL.Mapped[Optional[str]]means it is nullable.mapped_column()replaces the oldColumn()and integrates with type checkers.- Use
__table_args__for table-level constraints and indexes.
class Post(Base):
__tablename__ = "posts"
__table_args__ = (
UniqueConstraint("author_id", "slug", name="uq_author_slug"),
Index("ix_status_created", "status", "created_at"),
{"schema": "blog"}, # Optional: specify database schema
)
# ... columns ...
Create all tables from your models:
# Create all tables (development only; use Alembic in production)
Base.metadata.create_all(engine)
# Drop all tables
Base.metadata.drop_all(engine)
4. CRUD Operations
All ORM operations happen through a Session. The session manages the conversation between your Python code and the database, tracks changes to objects, and flushes those changes as SQL statements:
from sqlalchemy.orm import Session, sessionmaker
# Create a session factory
SessionLocal = sessionmaker(bind=engine)
# Use as a context manager (recommended)
with Session(engine) as session:
# CREATE
user = User(username="alice", email="alice@example.com")
session.add(user)
session.commit()
# READ
user = session.get(User, 1) # Get by primary key
print(user.username)
# UPDATE
user.email = "newalice@example.com" # Just modify the attribute
session.commit() # SQLAlchemy detects the change and issues UPDATE
# DELETE
session.delete(user)
session.commit()
For adding multiple objects at once:
with Session(engine) as session:
users = [
User(username="bob", email="bob@example.com"),
User(username="carol", email="carol@example.com"),
User(username="dave", email="dave@example.com"),
]
session.add_all(users)
session.commit()
Always wrap operations in try/except and call session.rollback() on IntegrityError or other exceptions to keep the session in a usable state.
5. Querying
SQLAlchemy 2.0 uses the select() construct for all queries. You execute a statement and extract results from the returned Result object:
from sqlalchemy import select, func, and_, or_, desc
with Session(engine) as session:
# Select all users
stmt = select(User)
users = session.execute(stmt).scalars().all()
# Filter with where()
stmt = select(User).where(User.is_active == True)
active_users = session.execute(stmt).scalars().all()
# Filter with filter_by() (keyword arguments)
stmt = select(User).filter_by(username="alice")
alice = session.execute(stmt).scalars().first()
# Multiple conditions (AND)
stmt = select(Post).where(
Post.status == "published",
Post.views > 100
)
# OR conditions
stmt = select(Post).where(
or_(Post.status == "published", Post.author_id == 1)
)
# Ordering
stmt = select(Post).order_by(desc(Post.created_at))
# Limit and offset (pagination)
stmt = select(Post).order_by(Post.id).limit(10).offset(20)
# Count
stmt = select(func.count()).select_from(User)
total = session.execute(stmt).scalar()
# Specific columns
stmt = select(User.username, User.email)
rows = session.execute(stmt).all()
for username, email in rows:
print(f"{username}: {email}")
Joins
# Implicit join via relationship
stmt = select(Post).join(Post.author).where(User.username == "alice")
alice_posts = session.execute(stmt).scalars().all()
# Explicit join
stmt = select(User.username, func.count(Post.id)).join(
Post, User.id == Post.author_id
).group_by(User.username)
# Left outer join
stmt = select(User, func.count(Post.id)).outerjoin(
Post, User.id == Post.author_id
).group_by(User.id)
Aggregations and Subqueries
# Aggregation with group_by
stmt = (
select(Post.status, func.count(Post.id).label("total"))
.group_by(Post.status)
.having(func.count(Post.id) > 5)
)
results = session.execute(stmt).all()
# Subquery: users with more than 10 posts
subq = (
select(Post.author_id, func.count(Post.id).label("post_count"))
.group_by(Post.author_id)
.having(func.count(Post.id) > 10)
.subquery()
)
stmt = select(User).join(subq, User.id == subq.c.author_id)
prolific_users = session.execute(stmt).scalars().all()
6. Relationships
SQLAlchemy supports all standard relationship patterns. Use relationship() to define how models connect and back_populates for bidirectional navigation:
One-to-Many
class Department(Base):
__tablename__ = "departments"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(100))
employees: Mapped[list["Employee"]] = relationship(back_populates="department")
class Employee(Base):
__tablename__ = "employees"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(100))
department_id: Mapped[int] = mapped_column(ForeignKey("departments.id"))
department: Mapped["Department"] = relationship(back_populates="employees")
# Usage
eng = Department(name="Engineering")
eng.employees.append(Employee(name="Alice"))
eng.employees.append(Employee(name="Bob"))
session.add(eng)
session.commit()
Many-to-Many
from sqlalchemy import Table, Column, Integer, ForeignKey
# Association table (no ORM model needed for simple M2M)
post_tags = Table(
"post_tags",
Base.metadata,
Column("post_id", Integer, ForeignKey("posts.id"), primary_key=True),
Column("tag_id", Integer, ForeignKey("tags.id"), primary_key=True),
)
class Tag(Base):
__tablename__ = "tags"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(50), unique=True)
posts: Mapped[list["Post"]] = relationship(
secondary=post_tags, back_populates="tags"
)
# Usage
post = Post(title="SQLAlchemy Guide", slug="sqlalchemy-guide", body="...")
post.tags.append(Tag(name="python"))
post.tags.append(Tag(name="database"))
Lazy Loading Strategies
from sqlalchemy.orm import joinedload, selectinload
# Per-query eager loading (preferred approach)
stmt = select(User).options(selectinload(User.posts))
users = session.execute(stmt).scalars().all()
# Nested eager loading
stmt = select(User).options(
selectinload(User.posts).selectinload(Post.tags)
)
# Joined load for many-to-one
stmt = select(Post).options(joinedload(Post.author))
7. Migrations with Alembic
Alembic is the official migration tool for SQLAlchemy. It tracks schema changes in version-controlled migration scripts, similar to Django migrations or Rails ActiveRecord migrations.
# Install Alembic
pip install alembic
# Initialize Alembic in your project
alembic init alembic
# This creates:
# alembic/ - migration scripts directory
# alembic/env.py - environment configuration
# alembic.ini - Alembic configuration file
Configure alembic/env.py to import your models and set target_metadata = Base.metadata. Set your database URL in alembic.ini. Then generate and apply migrations:
# Auto-generate a migration by comparing models to database
alembic revision --autogenerate -m "create users and posts tables"
# Apply all pending migrations
alembic upgrade head
# Downgrade one step
alembic downgrade -1
# Show current migration version
alembic current
# Show migration history
alembic history --verbose
# Generate an empty migration for custom SQL
alembic revision -m "add custom index"
Example auto-generated migration file:
"""create users and posts tables
Revision ID: a1b2c3d4e5f6
Revises:
Create Date: 2026-02-12 10:00:00.000000
"""
from alembic import op
import sqlalchemy as sa
revision = "a1b2c3d4e5f6"
down_revision = None
def upgrade():
op.create_table(
"users",
sa.Column("id", sa.Integer(), primary_key=True),
sa.Column("username", sa.String(50), unique=True, nullable=False),
sa.Column("email", sa.String(255), unique=True, nullable=False),
)
op.create_index("ix_users_username", "users", ["username"])
def downgrade():
op.drop_index("ix_users_username", table_name="users")
op.drop_table("users")
alembic revision without --autogenerate for those cases.
8. Advanced Patterns
Hybrid Properties
Hybrid properties work both as Python properties and in SQL queries:
from sqlalchemy.ext.hybrid import hybrid_property
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
first_name: Mapped[str] = mapped_column(String(50))
last_name: Mapped[str] = mapped_column(String(50))
@hybrid_property
def full_name(self) -> str:
return f"{self.first_name} {self.last_name}"
@full_name.expression
@classmethod
def full_name(cls):
return cls.first_name + " " + cls.last_name
# Works in Python
user.full_name # "Alice Smith"
# Works in SQL queries
stmt = select(User).where(User.full_name == "Alice Smith")
Event Listeners
from sqlalchemy import event
@event.listens_for(User, "before_insert")
def set_created_at(mapper, connection, target):
target.created_at = datetime.utcnow()
@event.listens_for(Post, "before_update")
def update_slug(mapper, connection, target):
if target.title:
target.slug = target.title.lower().replace(" ", "-")
# Session-level events
@event.listens_for(Session, "after_commit")
def after_commit(session):
print("Transaction committed")
Mixins for Reusable Columns
from datetime import datetime
class TimestampMixin:
created_at: Mapped[datetime] = mapped_column(server_default=func.now())
updated_at: Mapped[datetime] = mapped_column(
server_default=func.now(), onupdate=func.now()
)
class SoftDeleteMixin:
is_deleted: Mapped[bool] = mapped_column(default=False)
deleted_at: Mapped[Optional[datetime]] = mapped_column(default=None)
class User(TimestampMixin, SoftDeleteMixin, Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(String(50))
# Inherits created_at, updated_at, is_deleted, deleted_at
Custom Types
import json
from sqlalchemy.types import TypeDecorator, Text
class JSONEncodedDict(TypeDecorator):
"""Store a Python dict as a JSON string."""
impl = Text
cache_ok = True
def process_bind_param(self, value, dialect):
if value is not None:
return json.dumps(value)
def process_result_value(self, value, dialect):
if value is not None:
return json.loads(value)
class Config(Base):
__tablename__ = "configs"
id: Mapped[int] = mapped_column(primary_key=True)
settings: Mapped[dict] = mapped_column(JSONEncodedDict)
9. Async SQLAlchemy
SQLAlchemy has first-class async support for use with asyncio, making it the ideal choice for FastAPI, Starlette, and other async frameworks:
from sqlalchemy.ext.asyncio import (
create_async_engine,
AsyncSession,
async_sessionmaker,
)
# Async engine (note the async driver in the URL)
async_engine = create_async_engine(
"postgresql+asyncpg://user:password@localhost/mydb",
echo=True,
pool_size=10,
)
# Async session factory
AsyncSessionLocal = async_sessionmaker(async_engine, expire_on_commit=False)
Async CRUD Operations
import asyncio
from sqlalchemy import select
async def create_user(username: str, email: str) -> User:
async with AsyncSessionLocal() as session:
user = User(username=username, email=email)
session.add(user)
await session.commit()
await session.refresh(user)
return user
async def get_users() -> list[User]:
async with AsyncSessionLocal() as session:
stmt = select(User).where(User.is_active == True)
result = await session.execute(stmt)
return result.scalars().all()
async def update_user(user_id: int, email: str) -> None:
async with AsyncSessionLocal() as session:
user = await session.get(User, user_id)
if user:
user.email = email
await session.commit()
# Run it
asyncio.run(create_user("alice", "alice@example.com"))
FastAPI Integration
from fastapi import FastAPI, Depends
app = FastAPI()
async def get_db():
async with AsyncSessionLocal() as session:
yield session
@app.get("/users")
async def list_users(db: AsyncSession = Depends(get_db)):
stmt = select(User).where(User.is_active == True)
result = await db.execute(stmt)
return result.scalars().all()
@app.post("/users")
async def create_user(username: str, email: str, db: AsyncSession = Depends(get_db)):
user = User(username=username, email=email)
db.add(user)
await db.commit()
await db.refresh(user)
return user
10. Performance Optimization
Solving the N+1 Problem
# BAD: N+1 queries (1 for users + N for posts)
users = session.execute(select(User)).scalars().all()
for user in users:
print(user.posts) # Each access triggers a SELECT
# GOOD: Eager load with selectinload (2 queries total)
stmt = select(User).options(selectinload(User.posts))
users = session.execute(stmt).scalars().all()
for user in users:
print(user.posts) # Already loaded, no extra query
# GOOD: Eager load with joinedload (1 query with JOIN)
stmt = select(Post).options(joinedload(Post.author))
posts = session.execute(stmt).scalars().unique().all()
Bulk Operations
from sqlalchemy import insert, update, delete
# Bulk insert (bypasses ORM, much faster for large datasets)
with Session(engine) as session:
session.execute(
insert(User),
[
{"username": f"user_{i}", "email": f"user_{i}@example.com"}
for i in range(10000)
],
)
session.commit()
# Bulk update
with Session(engine) as session:
session.execute(
update(Post).where(Post.status == "draft").values(status="archived")
)
session.commit()
# Bulk delete
with Session(engine) as session:
session.execute(
delete(Post).where(Post.created_at < datetime(2025, 1, 1))
)
session.commit()
Connection Pooling
SQLAlchemy uses QueuePool by default, maintaining a pool of persistent connections. Tune with pool_size, max_overflow, pool_timeout, pool_recycle, and pool_pre_ping=True (see the Engine Setup section above). For serverless environments like AWS Lambda, use NullPool to create a fresh connection each time.
Performance checklist:
- Use
echo=Trueor logging to count queries during development - Apply
selectinload()orjoinedload()before iterating over related objects - Use
lazy="raise"on relationships to catch N+1 bugs early - Use bulk
insert()andupdate()instead of ORM for thousands of rows - Add database indexes on columns used in
WHERE,ORDER BY, andJOIN - Set
expire_on_commit=Falseto avoid unnecessary refresh queries - Use
pool_pre_ping=Trueto handle stale connections gracefully
11. SQLAlchemy 2.0 Style
SQLAlchemy 2.0 brought significant changes to the API. Here is a comparison of the old 1.x style and the new 2.0 style:
# 1.x LEGACY style (still works but deprecated)
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String(50))
# Query with legacy API
users = session.query(User).filter(User.name == "alice").all()
# 2.0 CURRENT style (recommended)
from sqlalchemy import String, select
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(50))
# Query with 2.0 API
stmt = select(User).where(User.name == "alice")
users = session.execute(stmt).scalars().all()
To migrate an existing 1.x codebase, add future=True to your engine in SQLAlchemy 1.4 to enable 2.0-style behavior and surface deprecation warnings before upgrading.
12. Testing with SQLAlchemy
Use an in-memory SQLite database for fast, isolated tests. Each test gets a fresh database with no leftover data:
import pytest
from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session
from myapp.models import Base, User, Post
@pytest.fixture
def engine():
"""Create a fresh in-memory database for each test."""
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
yield engine
Base.metadata.drop_all(engine)
@pytest.fixture
def session(engine):
"""Create a session that rolls back after each test."""
with Session(engine) as session:
yield session
session.rollback()
def test_create_user(session):
user = User(username="testuser", email="test@example.com")
session.add(user)
session.commit()
result = session.execute(
select(User).where(User.username == "testuser")
)
found = result.scalars().first()
assert found is not None
assert found.email == "test@example.com"
def test_user_posts_relationship(session):
user = User(username="author", email="author@example.com")
post = Post(title="Test Post", slug="test-post", body="Content", author=user)
session.add(post)
session.commit()
stmt = select(User).options(selectinload(User.posts))
author = session.execute(stmt).scalars().first()
assert len(author.posts) == 1
assert author.posts[0].title == "Test Post"
For integration tests against a real database, use the same fixture pattern but point the engine at PostgreSQL via TEST_DATABASE_URL. Wrap each test in a transaction and roll it back after the test to keep tests isolated without re-creating tables.
Frequently Asked Questions
What is the difference between SQLAlchemy Core and SQLAlchemy ORM?
SQLAlchemy Core is a low-level SQL abstraction layer that lets you construct SQL statements using Python objects like select(), insert(), and Table. It gives you full control over the generated SQL and is ideal for bulk operations and complex reporting queries. SQLAlchemy ORM is a higher-level layer built on Core that maps database rows to Python class instances, manages object identity, tracks changes automatically, and handles relationships. Use Core when you need maximum performance or fine-grained SQL control, and ORM when you want the productivity of working with Python objects.
How do I fix the N+1 query problem in SQLAlchemy?
The N+1 problem occurs when accessing a lazy-loaded relationship on N objects triggers N additional SQL queries. Fix it by using eager loading strategies: joinedload() performs a SQL JOIN to fetch related objects in a single query (best for many-to-one), and selectinload() executes a second SELECT with an IN clause (best for one-to-many and many-to-many). Apply them per-query with session.execute(select(User).options(selectinload(User.posts))). Set lazy="raise" on relationships during development to catch N+1 patterns early.
Should I use SQLAlchemy or Django ORM for my Python project?
If you are building a Django web application, use Django ORM because it is tightly integrated with Django's migration system, admin interface, and REST framework. For any other Python project (FastAPI, Flask, standalone scripts, data pipelines), SQLAlchemy is the standard choice. It is more flexible, supports both Core and ORM layers, has first-class async support, and gives you more control over generated SQL. SQLAlchemy also excels at complex queries, custom types, and multi-database setups.
How do I use SQLAlchemy with FastAPI?
Create an async engine with create_async_engine() and use AsyncSession with async_sessionmaker. Define a FastAPI dependency that yields a session: async def get_db() that creates a session, yields it, and closes it in a finally block. Use Depends(get_db) in your route functions. Define your models with DeclarativeBase, create Pydantic schemas for request/response validation, and use the session in your route handlers for database operations.
What changed in SQLAlchemy 2.0 compared to 1.x?
SQLAlchemy 2.0 introduced the new select() syntax replacing the legacy Query API, DeclarativeBase replacing declarative_base(), Mapped and mapped_column replacing Column for type-annotated models, and removed engine.execute(). These changes improve type safety, IDE autocompletion, and consistency across Core and ORM. You can enable 2.0 style in 1.4 by passing future=True to create_engine().
Conclusion
SQLAlchemy is the most comprehensive database toolkit in the Python ecosystem. Its layered architecture — Core for SQL expressions, ORM for object mapping — gives you the flexibility to choose the right abstraction level for each task. The concepts covered in this guide — engine setup, declarative models, CRUD sessions, querying, relationships, Alembic migrations, async support, and performance optimization — represent the complete toolkit for building data-driven Python applications.
Start with the fundamentals: define models with Mapped type annotations, use select() for all queries, always apply eager loading when iterating over relationships, and use Alembic for every schema change. As your application grows, reach for hybrid properties, event listeners, mixins, bulk operations, and async sessions. Profile with echo=True, add indexes to frequently-queried columns, and use pool_pre_ping=True in production.
Related Resources
- SQL Formatter — format and beautify the SQL that SQLAlchemy generates
- SQL Playground — test and experiment with SQL queries interactively
- SQL: The Complete Guide — understand the SQL fundamentals behind SQLAlchemy
- PostgreSQL Complete Guide — deep dive into the most popular SQLAlchemy backend
- Django ORM Complete Guide — compare with Django's built-in ORM
- FastAPI Complete Guide — build async APIs with SQLAlchemy and FastAPI