SQL: The Complete Guide for Developers
SQL (Structured Query Language) is the standard language for working with relational databases. Nearly every application that persists data uses a relational database behind the scenes, which means nearly every developer needs to know SQL. Whether you are building a web application, analyzing data, writing backend APIs, or managing infrastructure, SQL is a foundational skill that you will use throughout your career.
This guide covers everything a developer needs to know about SQL: from basic SELECT queries and filtering to JOINs, subqueries, window functions, transactions, indexing strategies, and query optimization. Every concept is accompanied by practical code examples you can run against any standard relational database. By the end, you will have a complete working knowledge of SQL that applies across PostgreSQL, MySQL, SQLite, SQL Server, and any other RDBMS.
Table of Contents
- Introduction to SQL
- Database Basics
- SELECT Queries
- Filtering Data
- Aggregate Functions
- JOINs
- Subqueries
- INSERT, UPDATE, DELETE
- CREATE TABLE, ALTER TABLE, DROP TABLE
- Indexes
- Views and CTEs
- Window Functions
- Transactions
- Performance Tips
- Frequently Asked Questions
Introduction to SQL
SQL stands for Structured Query Language. It was developed at IBM in the early 1970s and has been the standard language for relational database management ever since. SQL is a declarative language: you describe what data you want, and the database engine figures out how to retrieve it efficiently. You do not write loops or specify algorithms. You write queries.
SQL works with relational databases, which store data in tables with predefined relationships between them. A relational database management system (RDBMS) is the software that implements this model and processes SQL queries. The most widely used RDBMS options are:
- PostgreSQL — Open source, feature-rich, strong standards compliance. Excellent for complex queries, JSON support, and extensibility. The most popular choice for new projects.
- MySQL — Open source, widely deployed, fast for read-heavy workloads. Powers a significant portion of the web including WordPress. Owned by Oracle.
- SQLite — Serverless, embedded database stored in a single file. Perfect for mobile apps, desktop applications, and development/testing. No separate server process needed.
- SQL Server — Microsoft's enterprise RDBMS. Deep integration with .NET and Azure. Strong tooling with SQL Server Management Studio.
- Oracle Database — Enterprise-focused, highly scalable. Dominant in large corporations and financial institutions. Expensive licensing.
- MariaDB — Community fork of MySQL with additional features and performance improvements. Drop-in MySQL replacement.
The core SQL syntax is standardized by ANSI/ISO, so the SELECT, INSERT, UPDATE, DELETE, JOIN, and WHERE clauses you learn work across all of these systems. Each RDBMS adds its own extensions (PostgreSQL has JSONB operators, MySQL has LIMIT syntax differences, SQL Server has TOP instead of LIMIT), but 90% of what you write is portable. This guide uses standard SQL that runs on any of these systems, noting dialect differences where they matter.
Database Basics
A relational database organizes data into tables. Each table represents an entity (users, orders, products) and consists of columns (attributes) and rows (individual records). A schema is the overall structure of the database: the tables, their columns, data types, constraints, and relationships.
Tables, Rows, and Columns
Think of a table as a spreadsheet. Columns define the fields (name, email, created_at), and each row is one record. Here is what a users table looks like conceptually:
/* users table:
+----+----------+---------------------+------------+
| id | name | email | created_at |
+----+----------+---------------------+------------+
| 1 | Alice | alice@example.com | 2026-01-15 |
| 2 | Bob | bob@example.com | 2026-01-20 |
| 3 | Charlie | charlie@example.com | 2026-02-01 |
+----+----------+---------------------+------------+
- 4 columns: id, name, email, created_at
- 3 rows (records)
- id is the primary key (unique identifier for each row)
*/
Data Types
Every column has a data type that determines what values it can hold. The most common SQL data types are:
-- Numeric types
INTEGER -- Whole numbers: 1, 42, -100
BIGINT -- Large whole numbers (8 bytes)
DECIMAL(10, 2) -- Exact decimal: 99999999.99 (for money)
FLOAT / REAL -- Approximate floating-point numbers
BOOLEAN -- TRUE or FALSE
-- String types
VARCHAR(255) -- Variable-length string up to 255 characters
TEXT -- Variable-length string with no practical limit
CHAR(10) -- Fixed-length string, padded with spaces
-- Date/Time types
DATE -- Date only: '2026-02-11'
TIME -- Time only: '14:30:00'
TIMESTAMP -- Date and time: '2026-02-11 14:30:00'
INTERVAL -- Duration: '3 days', '2 hours'
-- Other common types
UUID -- Universally unique identifier (PostgreSQL)
JSON / JSONB -- JSON data (PostgreSQL, MySQL 5.7+)
BYTEA / BLOB -- Binary data
Primary Keys and Foreign Keys
A primary key uniquely identifies each row in a table. It must be unique and not NULL. Most tables use an auto-incrementing integer or a UUID as the primary key.
A foreign key is a column that references the primary key of another table, creating a relationship between the two tables. This is the "relational" in relational database.
-- orders table references users table via user_id
/*
orders:
+----+---------+--------+------------+
| id | user_id | amount | order_date |
+----+---------+--------+------------+
| 1 | 1 | 59.99 | 2026-02-01 |
| 2 | 1 | 24.50 | 2026-02-05 |
| 3 | 2 | 149.00 | 2026-02-08 |
+----+---------+--------+------------+
user_id is a foreign key referencing users(id)
Order 1 and 2 belong to Alice (user_id = 1)
Order 3 belongs to Bob (user_id = 2)
*/
Schemas and Constraints
Constraints enforce rules on the data to maintain integrity:
-- Common constraints
PRIMARY KEY -- Unique identifier, not NULL
FOREIGN KEY -- References another table's primary key
NOT NULL -- Column cannot contain NULL
UNIQUE -- All values in the column must be distinct
CHECK -- Custom condition (e.g., CHECK (age >= 0))
DEFAULT -- Default value if none is provided
SELECT Queries
The SELECT statement is the most used SQL command. It retrieves data from one or more tables. Every query you write to read data starts with SELECT.
Basic SELECT
-- Select all columns from a table
SELECT * FROM users;
-- Select specific columns
SELECT name, email FROM users;
-- Alias columns for cleaner output
SELECT name AS user_name, email AS user_email FROM users;
-- Select with expressions
SELECT name, amount, amount * 0.1 AS tax FROM orders;
WHERE: Filtering Rows
-- Filter by condition
SELECT * FROM users WHERE name = 'Alice';
-- Multiple conditions with AND/OR
SELECT * FROM orders WHERE amount > 50 AND order_date >= '2026-02-01';
SELECT * FROM users WHERE name = 'Alice' OR name = 'Bob';
ORDER BY: Sorting Results
-- Sort ascending (default)
SELECT * FROM users ORDER BY name;
-- Sort descending
SELECT * FROM orders ORDER BY amount DESC;
-- Sort by multiple columns
SELECT * FROM orders ORDER BY user_id ASC, order_date DESC;
LIMIT and OFFSET: Pagination
-- Get the first 10 rows
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
-- Skip the first 20 rows, then get 10 (page 3 of 10-per-page)
SELECT * FROM users ORDER BY created_at DESC LIMIT 10 OFFSET 20;
-- SQL Server uses TOP instead of LIMIT:
-- SELECT TOP 10 * FROM users ORDER BY created_at DESC;
DISTINCT: Removing Duplicates
-- Get unique values
SELECT DISTINCT city FROM customers;
-- Count unique values
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;
-- Distinct across multiple columns (unique combinations)
SELECT DISTINCT city, state FROM customers;
Filtering Data
The WHERE clause supports a rich set of operators for filtering data. Mastering these operators is essential for writing precise queries.
Comparison Operators
SELECT * FROM products WHERE price = 29.99; -- Equal
SELECT * FROM products WHERE price != 29.99; -- Not equal (also <>)
SELECT * FROM products WHERE price > 50; -- Greater than
SELECT * FROM products WHERE price < 50; -- Less than
SELECT * FROM products WHERE price >= 50; -- Greater than or equal
SELECT * FROM products WHERE price <= 50; -- Less than or equal
LIKE: Pattern Matching
-- % matches any sequence of characters
SELECT * FROM users WHERE email LIKE '%@gmail.com'; -- Ends with @gmail.com
SELECT * FROM users WHERE name LIKE 'A%'; -- Starts with A
SELECT * FROM users WHERE name LIKE '%son%'; -- Contains "son"
-- _ matches exactly one character
SELECT * FROM products WHERE sku LIKE 'AB_123'; -- AB followed by any char then 123
-- Case-insensitive matching (PostgreSQL)
SELECT * FROM users WHERE name ILIKE '%alice%';
-- MySQL is case-insensitive by default with LIKE
IN: Matching a List
-- Match any value in a list
SELECT * FROM users WHERE id IN (1, 3, 5, 7);
-- Equivalent to multiple OR conditions
SELECT * FROM users WHERE id = 1 OR id = 3 OR id = 5 OR id = 7;
-- NOT IN: exclude values
SELECT * FROM users WHERE status NOT IN ('banned', 'deleted');
-- IN with a subquery
SELECT * FROM users WHERE id IN (
SELECT user_id FROM orders WHERE amount > 100
);
BETWEEN: Range Filtering
-- Inclusive range (includes both endpoints)
SELECT * FROM orders WHERE amount BETWEEN 50 AND 200;
-- Equivalent to:
SELECT * FROM orders WHERE amount >= 50 AND amount <= 200;
-- Date ranges
SELECT * FROM orders WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';
-- NOT BETWEEN
SELECT * FROM products WHERE price NOT BETWEEN 10 AND 50;
IS NULL and IS NOT NULL
-- NULL represents missing or unknown data
-- You cannot use = to check for NULL; you must use IS NULL
SELECT * FROM users WHERE phone IS NULL; -- Users with no phone number
SELECT * FROM users WHERE phone IS NOT NULL; -- Users who have a phone number
-- COALESCE: return the first non-NULL value
SELECT name, COALESCE(phone, 'N/A') AS phone FROM users;
-- NULLIF: return NULL if two values are equal
SELECT NULLIF(discount, 0) AS discount FROM products; -- Treat 0 as NULL
Combining Conditions
-- AND, OR, NOT with parentheses for clarity
SELECT * FROM products
WHERE (category = 'Electronics' OR category = 'Books')
AND price < 100
AND stock IS NOT NULL;
-- Without parentheses, AND binds tighter than OR
-- Always use parentheses to make your intent clear
Aggregate Functions
Aggregate functions compute a single result from a set of rows. They are the foundation of reporting and analytics queries.
COUNT, SUM, AVG, MIN, MAX
-- COUNT: number of rows
SELECT COUNT(*) AS total_users FROM users;
SELECT COUNT(phone) AS users_with_phone FROM users; -- Excludes NULLs
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;
-- SUM: total of a numeric column
SELECT SUM(amount) AS total_revenue FROM orders;
-- AVG: average value
SELECT AVG(amount) AS average_order FROM orders;
-- MIN and MAX
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive FROM products;
-- Combine multiple aggregates
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM orders;
GROUP BY: Grouping Rows
GROUP BY divides rows into groups and applies aggregate functions to each group separately.
-- Total orders and revenue per user
SELECT
user_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY user_id;
-- Orders per month
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;
-- Products per category
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
GROUP BY category
ORDER BY product_count DESC;
HAVING: Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after aggregation. You cannot use aggregate functions in WHERE; use HAVING instead.
-- Users who have placed more than 5 orders
SELECT
user_id,
COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;
-- Categories with an average price above 50
SELECT
category,
AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 50;
-- Complete query order: WHERE -> GROUP BY -> HAVING -> ORDER BY
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
WHERE stock > 0 -- Filter rows first
GROUP BY category -- Then group
HAVING COUNT(*) >= 3 -- Then filter groups
ORDER BY avg_price DESC; -- Then sort
JOINs
JOINs combine rows from two or more tables based on a related column. They are the core mechanism for querying data that spans multiple tables, which is nearly every real-world query.
INNER JOIN
INNER JOIN returns only rows where there is a match in both tables. If a user has no orders, they are excluded. If an order references a nonexistent user, it is excluded.
-- Get user names with their orders
SELECT
users.name,
orders.id AS order_id,
orders.amount,
orders.order_date
FROM users
INNER JOIN orders ON users.id = orders.user_id;
-- Using table aliases for brevity
SELECT u.name, o.id, o.amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- Join with additional filtering
SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100
ORDER BY o.amount DESC;
LEFT JOIN (LEFT OUTER JOIN)
LEFT JOIN returns all rows from the left table and matching rows from the right table. If there is no match, the right side columns contain NULL.
-- All users, even those with no orders
SELECT
u.name,
o.id AS order_id,
COALESCE(o.amount, 0) AS amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
/* Result:
+----------+----------+--------+
| name | order_id | amount |
+----------+----------+--------+
| Alice | 1 | 59.99 |
| Alice | 2 | 24.50 |
| Bob | 3 | 149.00 |
| Charlie | NULL | 0 | <-- No orders
+----------+----------+--------+
*/
-- Find users who have never placed an order
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;
RIGHT JOIN (RIGHT OUTER JOIN)
RIGHT JOIN is the mirror of LEFT JOIN: it returns all rows from the right table and matching rows from the left table. In practice, most developers rewrite RIGHT JOINs as LEFT JOINs by swapping the table order, since it is easier to read.
-- All orders, even if the user was deleted
SELECT u.name, o.id AS order_id, o.amount
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;
-- Equivalent LEFT JOIN (preferred style):
SELECT u.name, o.id AS order_id, o.amount
FROM orders o
LEFT JOIN users u ON u.id = o.user_id;
FULL OUTER JOIN
FULL OUTER JOIN returns all rows from both tables. Where there is no match, the missing side contains NULL. This is useful for finding unmatched records on both sides.
-- All users and all orders, matched where possible
SELECT u.name, o.id AS order_id, o.amount
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;
-- Find data mismatches between two tables
SELECT
a.id AS table_a_id,
b.id AS table_b_id
FROM table_a a
FULL OUTER JOIN table_b b ON a.id = b.id
WHERE a.id IS NULL OR b.id IS NULL;
CROSS JOIN
CROSS JOIN produces the Cartesian product of two tables: every row from the first table combined with every row from the second. If table A has 10 rows and table B has 5 rows, the result has 50 rows. Use with caution.
-- Generate all possible size/color combinations
SELECT s.size_name, c.color_name
FROM sizes s
CROSS JOIN colors c;
/* If sizes has (S, M, L) and colors has (Red, Blue):
S-Red, S-Blue, M-Red, M-Blue, L-Red, L-Blue (6 rows)
*/
-- Useful for generating date ranges or test data
SELECT d.date, p.product_id
FROM generate_series('2026-01-01'::date, '2026-01-31'::date, '1 day') AS d(date)
CROSS JOIN products p;
Self-Joins
A self-join joins a table to itself. This is useful for hierarchical data (employees and managers) or comparing rows within the same table.
-- Employees and their managers (same table)
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
/* employees table:
+----+---------+------------+
| id | name | manager_id |
+----+---------+------------+
| 1 | Alice | NULL | (CEO, no manager)
| 2 | Bob | 1 | (Reports to Alice)
| 3 | Charlie | 1 | (Reports to Alice)
| 4 | Diana | 2 | (Reports to Bob)
+----+---------+------------+
Result:
+----------+---------+
| employee | manager |
+----------+---------+
| Alice | NULL |
| Bob | Alice |
| Charlie | Alice |
| Diana | Bob |
+----------+---------+
*/
Joining Multiple Tables
-- Orders with user names and product details
SELECT
u.name AS customer,
p.name AS product,
oi.quantity,
oi.unit_price,
o.order_date
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2026-01-01'
ORDER BY o.order_date DESC;
Subqueries
A subquery is a query nested inside another query. Subqueries can appear in the SELECT list, FROM clause, WHERE clause, or HAVING clause. They let you build complex queries step by step.
Scalar Subqueries
A scalar subquery returns a single value (one row, one column). It can be used anywhere a single value is expected.
-- Orders above the average order amount
SELECT * FROM orders
WHERE amount > (SELECT AVG(amount) FROM orders);
-- Each user's order count compared to the overall average
SELECT
name,
(SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) AS order_count,
(SELECT AVG(amount) FROM orders) AS overall_avg
FROM users u;
Row Subqueries
Row subqueries return multiple rows of a single column. They are used with IN, ANY, ALL, and EXISTS.
-- Users who have placed an order
SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);
-- Products more expensive than ALL products in the 'Books' category
SELECT * FROM products
WHERE price > ALL (SELECT price FROM products WHERE category = 'Books');
-- Products more expensive than ANY product in the 'Books' category
SELECT * FROM products
WHERE price > ANY (SELECT price FROM products WHERE category = 'Books');
Table Subqueries (Derived Tables)
A table subquery returns a full result set and is used in the FROM clause as a derived table.
-- Average of per-user totals
SELECT AVG(user_total) AS avg_user_spending
FROM (
SELECT user_id, SUM(amount) AS user_total
FROM orders
GROUP BY user_id
) AS user_totals;
-- Top 3 categories by revenue with product counts
SELECT
ranked.category,
ranked.revenue,
ranked.product_count
FROM (
SELECT
category,
SUM(price * stock) AS revenue,
COUNT(*) AS product_count
FROM products
GROUP BY category
ORDER BY revenue DESC
LIMIT 3
) AS ranked;
Correlated Subqueries
A correlated subquery references columns from the outer query. It executes once for each row of the outer query, which can be slow on large tables but is sometimes the clearest way to express a query.
-- Users whose latest order is more than 30 days old
SELECT u.name, u.email
FROM users u
WHERE (
SELECT MAX(order_date) FROM orders o WHERE o.user_id = u.id
) < CURRENT_DATE - INTERVAL '30 days';
-- Each product with its rank within its category
SELECT
p.name,
p.category,
p.price,
(SELECT COUNT(*) FROM products p2
WHERE p2.category = p.category AND p2.price >= p.price
) AS rank_in_category
FROM products p
ORDER BY p.category, p.price DESC;
-- EXISTS: check if related rows exist (often faster than IN)
SELECT u.name
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 100
);
INSERT, UPDATE, DELETE
These three commands modify data in the database. They are collectively called DML (Data Manipulation Language) along with SELECT.
INSERT: Adding Rows
-- Insert a single row
INSERT INTO users (name, email, created_at)
VALUES ('Diana', 'diana@example.com', '2026-02-11');
-- Insert multiple rows at once
INSERT INTO users (name, email, created_at) VALUES
('Eve', 'eve@example.com', '2026-02-11'),
('Frank', 'frank@example.com', '2026-02-11'),
('Grace', 'grace@example.com', '2026-02-11');
-- Insert from a SELECT query
INSERT INTO archived_orders (id, user_id, amount, order_date)
SELECT id, user_id, amount, order_date
FROM orders
WHERE order_date < '2025-01-01';
-- INSERT with RETURNING (PostgreSQL) to get the inserted row back
INSERT INTO users (name, email)
VALUES ('Hank', 'hank@example.com')
RETURNING id, name, created_at;
-- INSERT ... ON CONFLICT (upsert in PostgreSQL)
INSERT INTO users (email, name)
VALUES ('alice@example.com', 'Alice Updated')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;
UPDATE: Modifying Rows
-- Update specific rows
UPDATE users SET name = 'Alice Smith' WHERE id = 1;
-- Update multiple columns
UPDATE products
SET price = price * 0.9, updated_at = NOW()
WHERE category = 'Electronics';
-- Update with a subquery
UPDATE orders
SET status = 'archived'
WHERE order_date < (SELECT CURRENT_DATE - INTERVAL '1 year');
-- Update with a JOIN (PostgreSQL syntax)
UPDATE orders o
SET status = 'vip_order'
FROM users u
WHERE o.user_id = u.id AND u.membership = 'vip';
-- IMPORTANT: Always include a WHERE clause!
-- UPDATE users SET status = 'inactive'; -- This updates EVERY row!
DELETE: Removing Rows
-- Delete specific rows
DELETE FROM orders WHERE id = 42;
-- Delete based on a condition
DELETE FROM sessions WHERE expires_at < NOW();
-- Delete with a subquery
DELETE FROM users
WHERE id NOT IN (SELECT DISTINCT user_id FROM orders);
-- Delete with RETURNING (PostgreSQL)
DELETE FROM expired_tokens
WHERE expires_at < NOW()
RETURNING id, token;
-- TRUNCATE: delete all rows (faster than DELETE for full table clear)
TRUNCATE TABLE temp_data;
-- IMPORTANT: Always include a WHERE clause with DELETE!
-- DELETE FROM users; -- This deletes EVERY row!
CREATE TABLE, ALTER TABLE, DROP TABLE
These commands define and modify the database structure. They are DDL (Data Definition Language) commands.
CREATE TABLE
-- Basic table creation
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- Auto-incrementing integer (PostgreSQL)
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20),
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMP DEFAULT NOW()
);
-- Table with foreign key
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
amount DECIMAL(10, 2) NOT NULL CHECK (amount > 0),
status VARCHAR(20) DEFAULT 'pending',
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
notes TEXT
);
-- Table with composite primary key
CREATE TABLE order_items (
order_id INTEGER REFERENCES orders(id) ON DELETE CASCADE,
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10, 2) NOT NULL,
PRIMARY KEY (order_id, product_id)
);
-- CREATE TABLE IF NOT EXISTS (prevents errors if table already exists)
CREATE TABLE IF NOT EXISTS logs (
id SERIAL PRIMARY KEY,
message TEXT NOT NULL,
level VARCHAR(10) DEFAULT 'info',
created_at TIMESTAMP DEFAULT NOW()
);
-- MySQL auto-increment syntax:
-- id INT AUTO_INCREMENT PRIMARY KEY
ALTER TABLE
-- Add a column
ALTER TABLE users ADD COLUMN bio TEXT;
-- Drop a column
ALTER TABLE users DROP COLUMN phone;
-- Rename a column
ALTER TABLE users RENAME COLUMN name TO full_name;
-- Change a column's data type
ALTER TABLE products ALTER COLUMN price TYPE DECIMAL(12, 2);
-- Add a constraint
ALTER TABLE users ADD CONSTRAINT unique_email UNIQUE (email);
-- Add a foreign key
ALTER TABLE orders ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE;
-- Drop a constraint
ALTER TABLE users DROP CONSTRAINT unique_email;
-- Set a default value
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';
-- Rename the table
ALTER TABLE users RENAME TO app_users;
DROP TABLE
-- Drop a table (fails if table does not exist)
DROP TABLE temp_data;
-- Drop if exists (safe)
DROP TABLE IF EXISTS temp_data;
-- Drop with CASCADE (also drops dependent objects like foreign keys)
DROP TABLE IF EXISTS users CASCADE;
Indexes
An index is a data structure that speeds up data retrieval at the cost of additional storage and slower writes. Without an index, the database must scan every row in a table to find matches (a full table scan). With an index, it can jump directly to the relevant rows.
How Indexes Work
The most common index type is the B-tree (balanced tree). It organizes data in a sorted tree structure where each node contains pointers to child nodes and to actual table rows. Looking up a value in a B-tree index takes O(log n) time instead of O(n) for a full scan.
/* B-tree index conceptual structure:
Searching for user_id = 42 in an orders table with 1,000,000 rows:
Without index: scan all 1,000,000 rows ~ 1,000,000 comparisons
With B-tree: traverse tree levels (log2 1M ~ 20) ~ 20 comparisons
The difference is dramatic at scale.
*/
Creating Indexes
-- Create an index on a single column
CREATE INDEX idx_orders_user_id ON orders (user_id);
-- Create a unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email ON users (email);
-- Composite index (multiple columns)
CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);
-- This index helps queries that filter by user_id alone
-- or by user_id AND order_date, but NOT order_date alone
-- Partial index (PostgreSQL): index only a subset of rows
CREATE INDEX idx_active_users ON users (email)
WHERE status = 'active';
-- Expression index: index a computed value
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Drop an index
DROP INDEX idx_orders_user_id;
Index Types
-- B-tree (default): equality, range, sorting
CREATE INDEX idx_btree ON orders (amount);
-- Good for: =, <, >, <=, >=, BETWEEN, ORDER BY, LIKE 'prefix%'
-- Hash (PostgreSQL): equality only, faster than B-tree for exact matches
CREATE INDEX idx_hash ON users USING HASH (email);
-- Good for: = only (not <, >, BETWEEN, ORDER BY)
-- GIN (PostgreSQL): full-text search, JSONB, arrays
CREATE INDEX idx_gin ON articles USING GIN (to_tsvector('english', content));
CREATE INDEX idx_jsonb ON events USING GIN (metadata);
-- GiST (PostgreSQL): geometric data, full-text, range types
CREATE INDEX idx_gist ON locations USING GIST (coordinates);
When to Use Indexes (and When Not To)
- Index columns in WHERE clauses — if you frequently filter by
user_id, index it - Index columns used in JOINs — foreign keys should almost always be indexed
- Index columns in ORDER BY — avoids expensive sorting operations
- Do not index low-cardinality columns — a boolean column with only TRUE/FALSE values gains little from an index
- Do not index small tables — full scans on tables with fewer than 1000 rows are fast enough
- Do not over-index — each index slows down INSERT, UPDATE, and DELETE operations because the index must also be updated
- Primary keys are automatically indexed — no need to create a separate index
Views and CTEs (Common Table Expressions)
Views and CTEs help organize complex queries into readable, reusable components.
Views
A view is a named query stored in the database. It acts like a virtual table. When you query a view, the database runs the underlying query.
-- Create a view
CREATE VIEW active_users AS
SELECT id, name, email, created_at
FROM users
WHERE status = 'active';
-- Query the view like a table
SELECT * FROM active_users WHERE created_at >= '2026-01-01';
-- View for order summaries
CREATE VIEW order_summary AS
SELECT
u.name AS customer,
COUNT(o.id) AS total_orders,
SUM(o.amount) AS total_spent,
AVG(o.amount) AS avg_order_value,
MAX(o.order_date) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
-- Use the view
SELECT * FROM order_summary WHERE total_orders > 5 ORDER BY total_spent DESC;
-- Update or replace a view
CREATE OR REPLACE VIEW active_users AS
SELECT id, name, email, phone, created_at
FROM users
WHERE status = 'active' AND email IS NOT NULL;
-- Drop a view
DROP VIEW IF EXISTS order_summary;
CTEs (Common Table Expressions): The WITH Clause
A CTE is a temporary named result set that exists only for the duration of a single query. CTEs are defined with the WITH keyword and are especially useful for breaking complex queries into readable steps.
-- Basic CTE
WITH active_orders AS (
SELECT * FROM orders WHERE status = 'active'
)
SELECT u.name, ao.amount
FROM users u
INNER JOIN active_orders ao ON u.id = ao.user_id;
-- Multiple CTEs
WITH
monthly_totals AS (
SELECT
user_id,
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS total
FROM orders
GROUP BY user_id, DATE_TRUNC('month', order_date)
),
user_averages AS (
SELECT
user_id,
AVG(total) AS avg_monthly_spend
FROM monthly_totals
GROUP BY user_id
)
SELECT u.name, ua.avg_monthly_spend
FROM users u
INNER JOIN user_averages ua ON u.id = ua.user_id
ORDER BY ua.avg_monthly_spend DESC;
Recursive CTEs
Recursive CTEs can reference themselves, which is essential for traversing hierarchical or tree-structured data.
-- Organization hierarchy: find all employees under a manager
WITH RECURSIVE org_tree AS (
-- Base case: the starting employee
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE id = 1 -- Start from the CEO
UNION ALL
-- Recursive case: find direct reports
SELECT e.id, e.name, e.manager_id, t.level + 1
FROM employees e
INNER JOIN org_tree t ON e.manager_id = t.id
)
SELECT
REPEAT(' ', level - 1) || name AS org_chart,
level
FROM org_tree
ORDER BY level, name;
/* Result:
org_chart | level
----------------+------
Alice | 1
Bob | 2
Charlie | 2
Diana | 3
Eve | 3
*/
-- Find all ancestors of a category
WITH RECURSIVE category_path AS (
SELECT id, name, parent_id, name AS path
FROM categories
WHERE id = 42
UNION ALL
SELECT c.id, c.name, c.parent_id, c.name || ' > ' || cp.path
FROM categories c
INNER JOIN category_path cp ON c.id = cp.parent_id
)
SELECT path FROM category_path WHERE parent_id IS NULL;
Window Functions
Window functions perform calculations across a set of rows that are related to the current row. Unlike GROUP BY, which collapses rows into groups, window functions keep all individual rows in the output while adding computed columns. They are one of SQL's most powerful features.
ROW_NUMBER, RANK, DENSE_RANK
-- ROW_NUMBER: assign a unique sequential number to each row
SELECT
name,
category,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) AS overall_rank
FROM products;
-- ROW_NUMBER with PARTITION BY: rank within each category
SELECT
name,
category,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS category_rank
FROM products;
/* Result:
name | category | price | category_rank
----------+-------------+--------+--------------
Laptop | Electronics | 999.99 | 1
Phone | Electronics | 699.99 | 2
Headphones| Electronics | 149.99 | 3
SQL Book | Books | 49.99 | 1
Novel | Books | 14.99 | 2
*/
-- RANK: same as ROW_NUMBER but ties get the same rank (with gaps)
-- DENSE_RANK: same as RANK but no gaps after ties
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;
/* score=95: RANK=1, DENSE_RANK=1
score=95: RANK=1, DENSE_RANK=1 (tie)
score=90: RANK=3, DENSE_RANK=2 (RANK skips 2, DENSE_RANK does not)
score=85: RANK=4, DENSE_RANK=3
*/
LEAD and LAG: Accessing Adjacent Rows
-- LAG: access the previous row's value
-- LEAD: access the next row's value
SELECT
order_date,
amount,
LAG(amount) OVER (ORDER BY order_date) AS prev_amount,
LEAD(amount) OVER (ORDER BY order_date) AS next_amount,
amount - LAG(amount) OVER (ORDER BY order_date) AS change
FROM orders
WHERE user_id = 1
ORDER BY order_date;
/* Result:
order_date | amount | prev_amount | next_amount | change
-----------+--------+-------------+-------------+-------
2026-01-15 | 59.99 | NULL | 24.50 | NULL
2026-01-22 | 24.50 | 59.99 | 149.00 | -35.49
2026-02-01 | 149.00 | 24.50 | NULL | 124.50
*/
-- Day-over-day revenue change
SELECT
order_date,
daily_revenue,
LAG(daily_revenue) OVER (ORDER BY order_date) AS prev_day_revenue,
ROUND(
(daily_revenue - LAG(daily_revenue) OVER (ORDER BY order_date))
/ LAG(daily_revenue) OVER (ORDER BY order_date) * 100, 1
) AS pct_change
FROM (
SELECT order_date, SUM(amount) AS daily_revenue
FROM orders
GROUP BY order_date
) daily;
Running Totals and Moving Averages
-- Running total (cumulative sum)
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- Running total partitioned by user
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS user_running_total
FROM orders;
-- Moving average (last 3 orders)
SELECT
order_date,
amount,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM orders;
-- Percentage of total
SELECT
category,
name,
price,
ROUND(price / SUM(price) OVER (PARTITION BY category) * 100, 1) AS pct_of_category
FROM products;
Window Frame Clauses
-- The OVER clause can specify a frame of rows:
-- ROWS BETWEEN ... AND ...
SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- All rows from the start to the current row (running total)
AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- 7-day moving average (current row + 6 preceding)
SUM(amount) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
-- Current row plus one before and one after
SUM(amount) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-- Current row to the end
Transactions
A transaction is a group of SQL operations that must all succeed or all fail as a unit. Transactions ensure data integrity when multiple related changes must be made together. If anything goes wrong, you can roll back all changes.
ACID Properties
Every database transaction is governed by four properties, known as ACID:
- Atomicity — All operations in the transaction succeed, or none of them do. There is no partial completion. If a transfer between bank accounts fails midway, both accounts are restored to their original state.
- Consistency — The transaction brings the database from one valid state to another. All constraints, foreign keys, and rules are satisfied after the transaction completes.
- Isolation — Concurrent transactions do not interfere with each other. Each transaction sees a consistent snapshot of the data, as if it were the only transaction running.
- Durability — Once a transaction is committed, the changes are permanent and survive system crashes, power failures, or any other failure.
BEGIN, COMMIT, ROLLBACK
-- Basic transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT; -- Both updates are saved permanently
-- Transaction with rollback
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- Check if the balance went negative
-- If something is wrong, roll back
ROLLBACK; -- Both updates are undone; the database is unchanged
-- Practical example: transfer funds with validation
BEGIN;
-- Deduct from sender
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
-- Check if balance went negative
DO $$
BEGIN
IF (SELECT balance FROM accounts WHERE id = 1) < 0 THEN
RAISE EXCEPTION 'Insufficient funds';
END IF;
END $$;
-- Credit receiver
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- Insert a record of the transfer
INSERT INTO transfers (from_account, to_account, amount, transferred_at)
VALUES (1, 2, 500, NOW());
COMMIT;
Savepoints
-- Savepoints let you partially roll back within a transaction
BEGIN;
INSERT INTO orders (user_id, amount) VALUES (1, 100);
SAVEPOINT order_created;
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 99, 1);
-- Oops, product 99 does not exist and fails
ROLLBACK TO SAVEPOINT order_created;
-- The order INSERT is still intact; only the order_items INSERT was undone
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 42, 1);
COMMIT;
Isolation Levels
Isolation levels control how much transactions can see each other's uncommitted changes. From least to most strict:
-- Read Uncommitted: can see other transactions' uncommitted changes (dirty reads)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- Read Committed (PostgreSQL default): only sees committed changes
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Repeatable Read (MySQL default): same SELECT always returns same rows within a transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Serializable: strictest level, transactions execute as if sequential
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
/* Trade-off:
- Lower isolation = better performance, but potential anomalies
- Higher isolation = fewer anomalies, but more locking and potential deadlocks
For most web applications, Read Committed is sufficient.
Use Serializable only when absolute consistency is required
(e.g., financial transactions, inventory management).
*/
Performance Tips
Writing correct SQL is the first step. Writing fast SQL is what separates a working application from one that can handle real-world load. Here are the most impactful optimization strategies.
Use EXPLAIN to Understand Query Plans
-- See how the database will execute your query
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- With actual execution times (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
/* What to look for:
- Seq Scan (sequential scan): reading every row. Add an index if the table is large.
- Index Scan: using an index. This is what you want.
- Nested Loop: joining with a loop. Fine for small tables, slow for large ones.
- Hash Join: building a hash table for the join. Good for larger tables.
- Sort: sorting results. Can be avoided with an appropriate index.
- Rows: estimated vs actual row count. Large discrepancies indicate stale statistics.
*/
-- MySQL equivalent
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
Indexing Strategies
-- 1. Index columns used in WHERE, JOIN, and ORDER BY
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_date ON orders (order_date);
-- 2. Composite indexes for multi-column queries
-- The order of columns in the index matters!
CREATE INDEX idx_orders_user_date ON orders (user_id, order_date);
-- This helps: WHERE user_id = 1 AND order_date > '2026-01-01'
-- This helps: WHERE user_id = 1 (uses the first column)
-- This does NOT help: WHERE order_date > '2026-01-01' (skips first column)
-- 3. Covering indexes: include all needed columns to avoid table lookups
CREATE INDEX idx_orders_covering ON orders (user_id, order_date, amount);
-- A query for SELECT amount FROM orders WHERE user_id = 1
-- can be answered entirely from the index (index-only scan)
-- 4. Partial indexes for frequently filtered subsets
CREATE INDEX idx_active_orders ON orders (user_id)
WHERE status = 'active';
-- Smaller index, faster for queries that always filter by status = 'active'
Query Optimization Tips
-- 1. Avoid SELECT * in production code
SELECT id, name, email FROM users; -- Better: only fetch needed columns
-- 2. Use EXISTS instead of IN for large subqueries
-- Slow (evaluates entire subquery):
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);
-- Faster (stops at first match):
SELECT * FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
-- 3. Avoid functions on indexed columns in WHERE
-- Bad (index cannot be used):
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- Good (use an expression index, or store normalized data):
CREATE INDEX idx_lower_email ON users (LOWER(email));
-- Or: compare with the original case if the column has consistent casing
-- 4. Use LIMIT for queries that only need a few rows
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;
-- 5. Batch large updates/deletes to avoid long locks
-- Instead of: DELETE FROM logs WHERE created_at < '2025-01-01';
-- Do it in batches:
DELETE FROM logs WHERE id IN (
SELECT id FROM logs WHERE created_at < '2025-01-01' LIMIT 10000
);
-- Repeat until no rows are deleted
-- 6. Use UNION ALL instead of UNION when duplicates are acceptable
-- UNION removes duplicates (requires sorting), UNION ALL does not
SELECT name FROM customers
UNION ALL
SELECT name FROM suppliers;
-- 7. Avoid correlated subqueries on large tables; rewrite as JOINs
-- Slow (executes subquery once per row):
SELECT name, (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) FROM users u;
-- Faster (single join operation):
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
Monitoring and Maintenance
-- Update table statistics so the query planner makes good choices
ANALYZE orders; -- PostgreSQL
ANALYZE TABLE orders; -- MySQL
-- Find slow queries (PostgreSQL: enable pg_stat_statements)
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Find missing indexes: look for sequential scans on large tables
SELECT relname, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan AND n_live_tup > 10000
ORDER BY seq_scan - idx_scan DESC;
-- Check index usage: find unused indexes that waste space and slow writes
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;
Summary
SQL is a deep language, but most day-to-day development uses a consistent core set of features. Here is a practical framework for building your SQL skills:
- Start with SELECT, WHERE, ORDER BY, LIMIT — these cover 80% of read queries.
- Learn JOINs thoroughly — INNER JOIN and LEFT JOIN handle the vast majority of multi-table queries.
- Use GROUP BY and aggregates for reporting and analytics.
- Use CTEs (WITH clause) to break complex queries into readable steps. Prefer CTEs over deeply nested subqueries.
- Learn window functions for rankings, running totals, and row comparisons. They eliminate many self-join workarounds.
- Index your foreign keys and frequently filtered columns. Use EXPLAIN to verify your indexes are being used.
- Use transactions for any operation that modifies multiple rows or tables together.
- Always include WHERE in UPDATE and DELETE to avoid accidentally modifying the entire table.
SQL has been the standard database language for over 50 years, and it is not going anywhere. Every backend framework, ORM, data pipeline, and analytics tool builds on SQL underneath. Understanding it deeply makes you a better developer regardless of your stack.
Learn More
- SQL Formatter — format and beautify your SQL queries
- SQL Playground — test SQL queries interactively in the browser
- SQL Basics Cheat Sheet — quick reference for all essential SQL commands
Frequently Asked Questions
What is the difference between SQL and MySQL?
SQL (Structured Query Language) is the standard language for querying and managing relational databases. MySQL is a specific relational database management system (RDBMS) that uses SQL as its query language. Other RDBMS options that also use SQL include PostgreSQL, SQLite, SQL Server, and Oracle. The core SQL syntax is standardized, but each RDBMS adds its own extensions and features.
What is the difference between INNER JOIN and LEFT JOIN?
INNER JOIN returns only the rows where there is a match in both tables. LEFT JOIN returns all rows from the left table and the matching rows from the right table. If there is no match in the right table, the result contains NULL values for the right table's columns. LEFT JOIN is useful when you want to include records even if they have no related data in the joined table.
When should I use an index in SQL?
Use indexes on columns that appear frequently in WHERE clauses, JOIN conditions, and ORDER BY clauses. Indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE) because the index must be updated. Avoid indexing columns with low cardinality (few unique values), very small tables, or columns that are rarely queried. Primary keys are automatically indexed.
What are window functions in SQL?
Window functions perform calculations across a set of rows related to the current row without collapsing them into a single output row like GROUP BY does. Common window functions include ROW_NUMBER(), RANK(), DENSE_RANK(), LEAD(), LAG(), SUM() OVER, and AVG() OVER. They use the OVER clause with optional PARTITION BY and ORDER BY to define the window of rows.
What does ACID mean in database transactions?
ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity means all operations in a transaction succeed or all fail. Consistency means a transaction brings the database from one valid state to another. Isolation means concurrent transactions do not interfere with each other. Durability means committed transactions survive system failures. ACID properties guarantee reliable database transactions.
What is a CTE (Common Table Expression) and when should I use one?
A CTE (Common Table Expression) is a temporary named result set defined using the WITH clause that exists only for the duration of a single query. CTEs improve readability by breaking complex queries into named logical steps, can be referenced multiple times in the same query, and support recursion for hierarchical data. Use CTEs when a subquery would be hard to read, when you need to reference the same derived table multiple times, or when traversing tree structures.