MongoDB: The Complete Guide for 2026
MongoDB is the most popular NoSQL database in the world. It stores data as flexible, JSON-like documents instead of rigid rows and columns, making it a natural fit for applications where data structures evolve over time. From startups shipping their first MVP to enterprises managing billions of documents, MongoDB powers a vast range of production workloads.
This guide covers everything you need to work with MongoDB effectively: from installation and CRUD operations to aggregation pipelines, indexing strategies, schema design patterns, Mongoose for Node.js, PyMongo for Python, transactions, replication, sharding, and performance tuning.
Table of Contents
- What is MongoDB and When to Use NoSQL
- Installation and Setup
- CRUD Operations
- Query Operators
- Indexing
- Aggregation Pipeline
- Schema Design and Data Modeling
- Mongoose ODM for Node.js
- PyMongo for Python
- Transactions and ACID
- Replication and Replica Sets
- Sharding Basics
- Performance Tuning
- MongoDB vs PostgreSQL
- Frequently Asked Questions
What is MongoDB and When to Use NoSQL
MongoDB is a document-oriented database that stores data in BSON (Binary JSON) format. Instead of tables with fixed columns, MongoDB uses collections of documents. Each document is a self-contained JSON-like object that can have its own unique structure.
// A MongoDB document (stored in BSON, displayed as JSON)
{
"_id": ObjectId("65a1b2c3d4e5f6789012abcd"),
"name": "Alice Chen",
"email": "alice@example.com",
"skills": ["JavaScript", "Python", "MongoDB"],
"address": { "city": "San Francisco", "state": "CA", "zip": "94102" },
"projects": [
{ "name": "API Gateway", "status": "active", "stars": 142 },
{ "name": "Dashboard", "status": "completed", "stars": 89 }
],
"createdAt": ISODate("2026-01-15T10:30:00Z")
}
Use MongoDB when: your data has a flexible or evolving schema, you need to store nested objects and arrays naturally, horizontal scaling is a priority, or your app works primarily with JSON. Common use cases include content management, real-time analytics, IoT data, product catalogs, and user profiles.
Use a relational database when: you need complex joins across many tables, strict referential integrity with foreign keys, or advanced SQL features like window functions and CTEs.
Installation and Setup
Local, Docker, and Atlas
# Ubuntu / Debian (MongoDB 7.0)
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg --dearmor
echo "deb [ signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] \
https://repo.mongodb.org/apt/ubuntu jammy/mongodb-org/7.0 multiverse" | \
sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
sudo apt update && sudo apt install -y mongodb-org
sudo systemctl start mongod && sudo systemctl enable mongod
# macOS (Homebrew)
brew tap mongodb/brew && brew install mongodb-community@7.0
brew services start mongodb-community@7.0
# Docker (quickest way to get started)
docker run -d --name mongodb -p 27017:27017 \
-e MONGO_INITDB_ROOT_USERNAME=admin \
-e MONGO_INITDB_ROOT_PASSWORD=secret \
-v mongodb_data:/data/db mongo:7
# MongoDB Atlas (managed cloud): sign up at cloud.mongodb.com
# Free tier (M0) provides 512 MB — enough for development
mongosh "mongodb+srv://cluster0.abc123.mongodb.net/mydb" --username myuser
mongosh Basics
mongosh # Connect to local MongoDB
show dbs // List all databases
use myapp // Switch to (or create) a database
show collections // List collections
db.stats() // Database statistics
db.users.countDocuments() // Count documents
exit // Quit
CRUD Operations
Create (Insert)
// Insert a single document
db.users.insertOne({
name: "Alice Chen", email: "alice@example.com",
age: 29, skills: ["JavaScript", "MongoDB"], createdAt: new Date()
})
// Insert multiple documents
db.users.insertMany([
{ name: "Bob Smith", email: "bob@example.com", age: 34, skills: ["Python"] },
{ name: "Carol Davis", email: "carol@example.com", age: 27, skills: ["Go", "Redis"] }
])
Read (Find)
db.users.find() // All documents
db.users.find({ age: { $gte: 30 } }) // With filter
db.users.findOne({ email: "alice@example.com" }) // Single document
db.users.find({}, { name: 1, email: 1, _id: 0 }) // Projection (select fields)
db.users.find().sort({ age: -1 }).skip(10).limit(5) // Sort, paginate
db.users.countDocuments({ age: { $gte: 30 } }) // Count matches
Update
// Update one document — $set changes specific fields
db.users.updateOne(
{ email: "alice@example.com" },
{ $set: { age: 30 }, $inc: { loginCount: 1 } }
)
// Update many documents
db.users.updateMany({ age: { $lt: 30 } }, { $set: { tier: "junior" } })
// Array operations
db.users.updateOne({ email: "alice@example.com" }, { $addToSet: { skills: "TypeScript" } })
db.users.updateOne({ email: "alice@example.com" }, { $pull: { skills: "MongoDB" } })
// Upsert: insert if not found, update if found
db.users.updateOne(
{ email: "new@example.com" },
{ $set: { name: "New User", createdAt: new Date() } },
{ upsert: true }
)
Delete
db.users.deleteOne({ email: "dan@example.com" }) // Delete one
db.users.deleteMany({ age: { $lt: 25 } }) // Delete many
db.users.deleteMany({}) // Delete all documents
db.users.drop() // Drop the entire collection
Query Operators
// COMPARISON
db.users.find({ age: { $eq: 30 } }) // Equal
db.users.find({ age: { $ne: 30 } }) // Not equal
db.users.find({ age: { $gt: 25, $lt: 40 } }) // Range (greater than, less than)
db.users.find({ age: { $gte: 25, $lte: 40 } }) // Inclusive range
db.users.find({ age: { $in: [25, 30, 35] } }) // In array of values
db.users.find({ age: { $nin: [25, 30] } }) // Not in array
// LOGICAL
db.users.find({ $and: [{ age: { $gte: 25 } }, { role: "admin" }] })
db.users.find({ $or: [{ age: { $lt: 25 } }, { age: { $gt: 60 } }] })
db.users.find({ age: { $not: { $gt: 40 } } })
// ELEMENT AND STRING
db.users.find({ phone: { $exists: true } }) // Field exists
db.users.find({ age: { $type: "number" } }) // BSON type check
db.users.find({ name: { $regex: /^alice/i } }) // Regex (case-insensitive)
// ARRAY
db.users.find({ skills: "MongoDB" }) // Array contains value
db.users.find({ skills: { $all: ["JS", "MongoDB"] } }) // Contains all values
db.users.find({ skills: { $size: 3 } }) // Array length is exactly 3
db.users.find({ projects: { $elemMatch: { status: "active", stars: { $gt: 100 } } } })
// NESTED DOCUMENTS (dot notation)
db.users.find({ "address.city": "San Francisco" })
db.users.find({ "address.state": { $in: ["CA", "NY", "TX"] } })
Indexing
Indexes are the single most important tool for MongoDB performance. Without them, MongoDB must scan every document in a collection to find matches.
// SINGLE FIELD INDEX
db.users.createIndex({ email: 1 }) // Ascending
db.users.createIndex({ createdAt: -1 }) // Descending
// UNIQUE INDEX (enforces uniqueness)
db.users.createIndex({ email: 1 }, { unique: true })
// COMPOUND INDEX (multiple fields — column order matters!)
db.orders.createIndex({ customerId: 1, createdAt: -1 })
// Supports: find({customerId: X}).sort({createdAt: -1}) and find({customerId: X})
// Does NOT help: find({createdAt: ...}) alone (skips first column)
// TEXT INDEX (full-text search)
db.posts.createIndex({ title: "text", body: "text" })
db.posts.find({ $text: { $search: "mongodb aggregation" } })
db.posts.find(
{ $text: { $search: "mongodb tutorial" } },
{ score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })
// TTL INDEX (auto-delete documents after expiry)
db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })
db.logs.createIndex({ createdAt: 1 }, { expireAfterSeconds: 2592000 }) // 30 days
// PARTIAL INDEX (only index matching documents)
db.orders.createIndex({ total: -1 }, { partialFilterExpression: { status: "completed" } })
// WILDCARD INDEX (index all fields in a subdocument)
db.products.createIndex({ "attributes.$**": 1 })
// MANAGE INDEXES
db.users.getIndexes() // List all indexes
db.users.dropIndex("email_1") // Drop by name
// CHECK IF QUERY USES AN INDEX
db.users.find({ email: "alice@example.com" }).explain("executionStats")
// Look for "IXSCAN" (good) vs "COLLSCAN" (bad — needs an index)
Aggregation Pipeline
The aggregation pipeline is MongoDB's most powerful feature for data processing. Documents flow through stages that filter, group, reshape, sort, and join data.
// Revenue report: filter, group, sort
db.orders.aggregate([
{ $match: { status: "completed", createdAt: { $gte: ISODate("2026-01-01") } } },
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$amount" },
orderCount: { $sum: 1 },
avgOrder: { $avg: "$amount" }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 20 }
])
// $project: reshape and compute fields
db.users.aggregate([
{ $project: {
name: 1, email: 1,
skillCount: { $size: "$skills" },
hasMongoSkill: { $in: ["MongoDB", "$skills"] }
}}
])
// $lookup: join collections (like LEFT JOIN in SQL)
db.orders.aggregate([
{ $lookup: {
from: "users", localField: "customerId",
foreignField: "_id", as: "customer"
}},
{ $unwind: "$customer" },
{ $project: { amount: 1, customerName: "$customer.name" } }
])
// $unwind + $group: count occurrences of array elements
db.users.aggregate([
{ $unwind: "$skills" },
{ $group: { _id: "$skills", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
])
// $bucket: group into ranges
db.users.aggregate([
{ $bucket: {
groupBy: "$age", boundaries: [18, 25, 35, 45, 65], default: "65+",
output: { count: { $sum: 1 }, avgAge: { $avg: "$age" } }
}}
])
// $facet: run multiple pipelines in parallel on the same data
db.products.aggregate([
{ $facet: {
topRated: [{ $sort: { rating: -1 } }, { $limit: 5 }],
categoryStats: [{ $group: { _id: "$category", count: { $sum: 1 } } }]
}}
])
Schema Design and Data Modeling
MongoDB schema design follows one rule: data that is accessed together should be stored together. The choice between embedding and referencing depends on your access patterns.
Embedded Documents vs References
// EMBED when: one-to-few, always read together, child doesn't need independent access
// Example: User with embedded address
{
_id: ObjectId("..."), name: "Alice Chen", email: "alice@example.com",
address: { street: "123 Main St", city: "San Francisco", state: "CA" },
preferences: { theme: "dark", language: "en" }
}
// REFERENCE when: one-to-many, data is shared, needs independent access
// users collection
{ _id: ObjectId("user1"), name: "Alice", email: "alice@example.com" }
// posts collection (references user)
{ _id: ObjectId("post1"), title: "MongoDB Guide", authorId: ObjectId("user1") }
// HYBRID PATTERN: embed frequently-read fields, reference for full data
{
_id: ObjectId("post1"), title: "MongoDB Guide",
authorId: ObjectId("user1"),
authorName: "Alice", // Denormalized for read performance
authorAvatar: "/img/alice.jpg" // Avoids $lookup for common display
}
Common Patterns
// BUCKET PATTERN: group time-series data to reduce document count
{
sensorId: "sensor-001",
date: ISODate("2026-02-12T14:00:00Z"),
readings: [
{ ts: ISODate("2026-02-12T14:00:12Z"), temp: 22.5 },
{ ts: ISODate("2026-02-12T14:01:30Z"), temp: 22.6 }
],
count: 2, sum_temp: 45.1
}
// POLYMORPHIC PATTERN: different shapes in one collection
{ type: "book", title: "MongoDB in Action", author: "Kyle Banker", pages: 480 }
{ type: "movie", title: "The Matrix", director: "Wachowskis", runtime: 136 }
Mongoose ODM for Node.js
Mongoose provides schema validation, middleware hooks, and an elegant API for working with MongoDB in Node.js applications.
// npm install mongoose
const mongoose = require('mongoose');
await mongoose.connect('mongodb://localhost:27017/myapp');
// Define a Schema with validation
const userSchema = new mongoose.Schema({
name: { type: String, required: true, trim: true },
email: { type: String, required: true, unique: true, lowercase: true },
age: { type: Number, min: 13, max: 120 },
role: { type: String, enum: ['user', 'admin', 'moderator'], default: 'user' },
skills: [String],
profile: { bio: { type: String, maxlength: 500 }, avatar: String },
isActive: { type: Boolean, default: true }
}, { timestamps: true }); // Adds createdAt and updatedAt
// Indexes
userSchema.index({ email: 1 }, { unique: true });
userSchema.index({ name: 'text', 'profile.bio': 'text' });
// Instance and static methods
userSchema.methods.getPublicProfile = function() {
return { name: this.name, skills: this.skills };
};
userSchema.statics.findByEmail = function(email) {
return this.findOne({ email: email.toLowerCase() });
};
// Middleware (hooks)
userSchema.pre('save', function(next) {
if (this.isModified('email')) this.email = this.email.toLowerCase();
next();
});
const User = mongoose.model('User', userSchema);
// CRUD operations
const alice = await User.create({ name: 'Alice', email: 'alice@example.com', skills: ['JS'] });
const users = await User.find({ age: { $gte: 25 } }).sort({ name: 1 }).limit(10);
await User.updateOne({ _id: alice._id }, { $push: { skills: 'MongoDB' } });
await User.deleteOne({ _id: alice._id });
PyMongo for Python
# pip install pymongo
from pymongo import MongoClient
from datetime import datetime, timezone
client = MongoClient("mongodb://localhost:27017/")
db = client["myapp"]
users = db["users"]
# Insert
result = users.insert_one({
"name": "Alice Chen", "email": "alice@example.com",
"age": 29, "skills": ["Python", "MongoDB"],
"created_at": datetime.now(timezone.utc)
})
# Find
user = users.find_one({"email": "alice@example.com"})
cursor = users.find({"age": {"$gte": 25}}, {"name": 1, "_id": 0}).sort("name", 1).limit(10)
# Update and Delete
users.update_one({"email": "alice@example.com"}, {"$set": {"age": 30}})
users.delete_one({"email": "alice@example.com"})
# Aggregation
pipeline = [
{"$match": {"age": {"$gte": 25}}},
{"$group": {"_id": None, "avg_age": {"$avg": "$age"}, "count": {"$sum": 1}}}
]
results = list(users.aggregate(pipeline))
# Create indexes
users.create_index("email", unique=True)
users.create_index([("name", 1), ("age", -1)])
Transactions and ACID
Single-document operations in MongoDB are always atomic. For operations spanning multiple documents or collections, MongoDB supports multi-document ACID transactions (4.0+ for replica sets, 4.2+ for sharded clusters).
// Multi-document transaction in mongosh
const session = db.getMongo().startSession();
session.startTransaction();
try {
const accounts = session.getDatabase("bank").accounts;
accounts.updateOne({ _id: "account-A" }, { $inc: { balance: -500 } }, { session });
accounts.updateOne({ _id: "account-B" }, { $inc: { balance: 500 } }, { session });
session.getDatabase("bank").transfers.insertOne(
{ from: "A", to: "B", amount: 500, date: new Date() }, { session }
);
session.commitTransaction();
} catch (error) {
session.abortTransaction();
throw error;
} finally { session.endSession(); }
// With Mongoose (Node.js)
const session = await mongoose.startSession();
await session.withTransaction(async () => {
await Account.updateOne({ _id: 'A' }, { $inc: { balance: -500 } }, { session });
await Account.updateOne({ _id: 'B' }, { $inc: { balance: 500 } }, { session });
});
session.endSession();
// With PyMongo (Python)
with client.start_session() as session:
with session.start_transaction():
db.accounts.update_one({"_id": "A"}, {"$inc": {"balance": -500}}, session=session)
db.accounts.update_one({"_id": "B"}, {"$inc": {"balance": 500}}, session=session)
Key point: Design your schema to minimize the need for multi-document transactions. Embedding related data in a single document gives you atomic operations without transaction overhead.
Replication and Replica Sets
A replica set is a group of MongoDB servers maintaining the same data. One node is the primary (handles writes), the others are secondaries (replicate data). If the primary fails, automatic election promotes a secondary within about 10 seconds.
// Initialize a 3-node replica set
rs.initiate({
_id: "myReplicaSet",
members: [
{ _id: 0, host: "mongo1:27017" },
{ _id: 1, host: "mongo2:27017" },
{ _id: 2, host: "mongo3:27017" }
]
})
rs.status() // Check replica set health
// READ PREFERENCE options:
// primary — always read from primary (default, strongest consistency)
// primaryPreferred — primary, fallback to secondary
// secondary — read from secondaries (eventual consistency, offloads primary)
// secondaryPreferred — secondary, fallback to primary
// nearest — lowest-latency node
// Connection string with replica set
// mongodb://mongo1:27017,mongo2:27017,mongo3:27017/mydb?replicaSet=myRS&readPreference=secondaryPreferred
// WRITE CONCERN: how many nodes must acknowledge a write
db.users.insertOne({ name: "Alice" }, { writeConcern: { w: "majority", wtimeout: 5000 } })
Sharding Basics
Sharding distributes data across multiple servers for horizontal scaling. Each shard holds a subset of the data, and MongoDB automatically balances and routes queries.
// Sharding architecture: mongos (router) + config servers + shards (replica sets)
sh.enableSharding("myapp")
// RANGE SHARDING: distributes by value ranges
sh.shardCollection("myapp.orders", { customerId: 1 })
// HASHED SHARDING: even distribution via hash function
sh.shardCollection("myapp.logs", { _id: "hashed" })
sh.status() // View shard distribution
// CHOOSING A SHARD KEY (critical, cannot be changed easily):
// Good properties: high cardinality, even distribution, query isolation
// Good: { customerId: 1 } — queries target specific customers on one shard
// Good: { _id: "hashed" } — even writes across all shards
// Bad: { status: 1 } — low cardinality, creates hot spots
// Bad: { createdAt: 1 } — monotonically increasing, all writes hit one shard
Performance Tuning
// 1. ANALYZE QUERIES WITH explain()
db.users.find({ email: "alice@example.com" }).explain("executionStats")
// Check: totalDocsExamined (close to nReturned?), stage (IXSCAN vs COLLSCAN)
// 2. CREATE INDEXES FOR YOUR QUERY PATTERNS
db.orders.createIndex({ customerId: 1, createdAt: -1 })
// 3. USE PROJECTION — return only needed fields
db.users.find({ age: { $gte: 25 } }, { name: 1, email: 1 }) // Not the whole document
// 4. PUT $match EARLY IN AGGREGATION PIPELINES
// Bad: join then filter (processes everything)
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $match: { status: "completed" } } // Too late!
])
// Good: filter first, then join
db.orders.aggregate([
{ $match: { status: "completed" } }, // Filter early — much faster
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } }
])
// 5. MONITOR SLOW QUERIES
db.setProfilingLevel(1, { slowms: 100 }) // Log queries slower than 100ms
db.system.profile.find().sort({ ts: -1 }).limit(5)
// 6. CHECK SERVER HEALTH
db.serverStatus().connections // Active connections
db.serverStatus().opcounters // Operation counts
db.currentOp() // Currently running operations
MongoDB vs PostgreSQL
| Feature | MongoDB | PostgreSQL |
|---|---|---|
| Data Model | Documents (BSON/JSON) | Tables with rows and columns |
| Schema | Flexible, schema-optional | Strict, enforced by DDL |
| Query Language | MQL (MongoDB Query Language) | SQL (standard, full-featured) |
| Joins | $lookup (aggregation only) | Full JOIN support (INNER, LEFT, FULL, LATERAL) |
| Transactions | Multi-document (since 4.0) | Full ACID, mature MVCC |
| Horizontal Scaling | Built-in sharding | Citus extension or manual partitioning |
| Nested Data | Native embedded documents | JSONB (powerful but secondary model) |
| Full-Text Search | Text indexes, Atlas Search | Built-in tsvector/tsquery with ranking |
| Best For | Flexible schemas, rapid iteration, horizontal scale | Complex queries, data integrity, advanced SQL |
Choose MongoDB when: your data is document-centric with nested objects and arrays, you need schema flexibility for rapid iteration, or you need built-in horizontal scaling. MongoDB is excellent for content management, real-time analytics, IoT, and catalog-style data.
Choose PostgreSQL when: your data is highly relational with many cross-table joins, you need strict data integrity with foreign keys and constraints, or you want advanced SQL features like window functions and CTEs.
Summary
MongoDB gives you a flexible, scalable document database that maps naturally to how applications work with data. Here is a roadmap for building your MongoDB expertise:
- Master CRUD and query operators — insertOne, find, updateOne, deleteOne, and the $ operators handle most daily work.
- Design schemas around access patterns — embed data that is read together, reference data that is shared or large.
- Add indexes for every query pattern — use explain() to verify queries use indexes instead of collection scans.
- Learn the aggregation pipeline — $match, $group, $project, $lookup, and $unwind cover analytics and reporting.
- Use an ODM for your language — Mongoose for Node.js or PyMongo for Python add validation and convenience.
- Deploy with replica sets — high availability and automatic failover from day one.
- Plan for scale with sharding — choose a good shard key early if you expect massive growth.
Frequently Asked Questions
When should I use MongoDB instead of a relational database?
Use MongoDB when your data has a flexible or evolving schema, when you need to store nested objects or arrays naturally, when horizontal scaling is a priority, or when your application works primarily with JSON-like documents. Common use cases include content management systems, real-time analytics, IoT data, product catalogs with varying attributes, and user profiles. Stick with a relational database like PostgreSQL when you need complex joins, strict ACID transactions across multiple tables, or when your data fits naturally into a tabular structure with well-defined relationships.
What is the MongoDB aggregation pipeline?
The aggregation pipeline is MongoDB's framework for data processing and transformation. Documents pass through a sequence of stages, each performing an operation like filtering ($match), grouping ($group), reshaping ($project), sorting ($sort), joining ($lookup), or unwinding arrays ($unwind). Stages are chained together in an array and processed in order. The pipeline can handle complex analytics, reporting, and data transformation that would require multiple queries or application-level processing in other systems.
How do I design schemas in MongoDB?
MongoDB schema design follows the principle of "data that is accessed together should be stored together." Embed related data directly in the document when the relationship is one-to-few, the child data is always read with the parent, and the embedded data does not grow unboundedly. Use references (storing ObjectIds) when the related data is large, shared across many documents, or needs independent access. Avoid deeply nested structures beyond 3-4 levels. Use the $lookup aggregation stage for reference-based joins when needed.
Does MongoDB support transactions?
Yes. Since MongoDB 4.0, multi-document transactions are fully supported for replica sets, and since 4.2, they work across sharded clusters. Transactions provide ACID guarantees: you can read and write to multiple documents and collections within a single atomic transaction. However, MongoDB's document model often eliminates the need for multi-document transactions because related data can be embedded in a single document, and single-document operations are always atomic. Use transactions only when you genuinely need atomic writes across multiple documents.
What is the difference between MongoDB and PostgreSQL?
MongoDB is a document database that stores data as flexible JSON-like documents (BSON), while PostgreSQL is a relational database that stores data in structured tables with rows and columns. MongoDB excels at flexible schemas, horizontal scaling via sharding, and storing nested data naturally. PostgreSQL excels at complex joins, strict data integrity with constraints and foreign keys, and advanced SQL features like window functions and CTEs. MongoDB scales horizontally more easily; PostgreSQL scales vertically and offers more query power. Many teams now use PostgreSQL's JSONB for document-style storage within a relational database.