Elasticsearch: The Complete Guide for 2026
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It stores JSON documents, indexes every field automatically, and returns search results in milliseconds — even across billions of documents. From powering site search at Wikipedia and GitHub to processing petabytes of logs at Netflix and Uber, Elasticsearch is the industry standard for full-text search, log analytics, and real-time aggregations.
This guide covers Elasticsearch from core concepts through production deployment: indexes, mappings, every query type you will use, aggregations, analyzers, performance tuning, Kibana integration, security, and how it compares to alternatives.
Table of Contents
- What is Elasticsearch
- Core Concepts
- Setting Up Elasticsearch
- CRUD Operations
- Search Queries
- Aggregations
- Mappings and Analyzers
- Full-Text Search Best Practices
- Index Templates and Lifecycle
- Performance Tuning
- Elasticsearch with Kibana
- Security
- Common Pitfalls and Troubleshooting
- Elasticsearch vs Alternatives
- Frequently Asked Questions
1. What is Elasticsearch
Elasticsearch is an open-source, distributed, RESTful search engine. You interact with it entirely through HTTP endpoints that accept and return JSON. Unlike traditional databases optimized for transactional writes, Elasticsearch is optimized for search: it builds inverted indexes on your data so full-text queries, filtering, and aggregations run in milliseconds.
When to use Elasticsearch:
- Full-text search — product search, site search, document search with relevance ranking, autocomplete, and fuzzy matching
- Log and event analytics — centralized logging with the ELK Stack (Elasticsearch, Logstash, Kibana) for real-time monitoring
- Real-time aggregations — dashboards showing counts, averages, histograms, and trends across millions of records
- Geospatial queries — find nearby locations, calculate distances, filter by bounding box
- Application performance monitoring — store and query traces, metrics, and spans
2. Core Concepts
- Index — a collection of related documents, similar to a database table. An index has a mapping that defines field types and analyzers.
- Document — a JSON object stored in an index. Each document has a unique
_idand is the basic unit of data. - Mapping — defines how fields are stored and indexed. Specifies field types (text, keyword, integer, date), analyzers, and whether fields are searchable.
- Shard — an index is split into shards distributed across nodes. Each shard is a self-contained Lucene index. Sharding enables horizontal scaling.
- Replica — a copy of a primary shard on a different node. Replicas provide fault tolerance and increase read throughput.
- Node — a single Elasticsearch server instance. Nodes join a cluster and can hold data, coordinate queries, or serve as the master.
- Cluster — one or more nodes working together, sharing data and load. A cluster has a single master node that manages index metadata.
3. Setting Up Elasticsearch
Docker (Recommended for Development)
# Single node for development
docker run -d --name elasticsearch \
-p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-v es-data:/usr/share/elasticsearch/data \
docker.elastic.co/elasticsearch/elasticsearch:8.17.0
# Verify it is running
curl http://localhost:9200
Install on Ubuntu/Debian
# Import the Elastic GPG key and add the repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | \
sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] \
https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update && sudo apt install elasticsearch
sudo systemctl enable elasticsearch && sudo systemctl start elasticsearch
curl -k https://localhost:9200
4. CRUD Operations
Elasticsearch uses a RESTful API. All operations are HTTP requests with JSON bodies.
Create an Index
# Create an index with settings
PUT /products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
Index (Create/Update) a Document
# Index a document with an explicit ID
PUT /products/_doc/1
{
"name": "Wireless Keyboard",
"category": "electronics",
"price": 49.99,
"in_stock": true,
"created_at": "2026-02-12"
}
# Auto-generate an ID
POST /products/_doc
{
"name": "USB-C Hub",
"category": "electronics",
"price": 29.99,
"in_stock": true
}
Read a Document
# Get a document by ID
GET /products/_doc/1
# Get only specific fields
GET /products/_doc/1?_source_includes=name,price
Update a Document
# Partial update (merges fields)
POST /products/_update/1
{
"doc": { "price": 44.99, "in_stock": false }
}
# Update with script
POST /products/_update/1
{
"script": {
"source": "ctx._source.price -= params.discount",
"params": { "discount": 5 }
}
}
Delete a Document
# Delete by ID
DELETE /products/_doc/1
# Delete by query
POST /products/_delete_by_query
{
"query": {
"term": { "in_stock": false }
}
}
5. Search Queries
Elasticsearch queries fall into two categories: full-text queries that analyze the search term and score results by relevance, and term-level queries that look for exact values without analysis.
Match Query (Full-Text)
# Search for documents containing "wireless keyboard"
GET /products/_search
{
"query": {
"match": {
"name": "wireless keyboard"
}
}
}
# Matches "Wireless Keyboard", "keyboard wireless", "wireless gaming keyboard"
Multi-Match Query
# Search across multiple fields
GET /products/_search
{
"query": {
"multi_match": {
"query": "wireless keyboard",
"fields": ["name^3", "description", "category"],
"type": "best_fields"
}
}
}
# ^3 boosts matches in "name" by 3x
Bool Query (Combine Conditions)
# Combine multiple conditions
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "name": "keyboard" } }
],
"filter": [
{ "term": { "category": "electronics" } },
{ "range": { "price": { "gte": 20, "lte": 100 } } }
],
"should": [
{ "term": { "in_stock": true } }
],
"must_not": [
{ "term": { "category": "refurbished" } }
]
}
}
}
# must: required, contributes to score
# filter: required, does NOT contribute to score (faster, cached)
# should: optional, boosts score if matched
# must_not: excludes documents
Term Query (Exact Match)
# Exact match on keyword fields (no analysis)
GET /products/_search
{
"query": {
"term": { "category": "electronics" }
}
}
# Multiple exact values
GET /products/_search
{
"query": {
"terms": { "category": ["electronics", "accessories"] }
}
}
Range Query
# Numeric range
GET /products/_search
{
"query": {
"range": {
"price": { "gte": 10, "lte": 50 }
}
}
}
# Date range
GET /logs/_search
{
"query": {
"range": {
"timestamp": {
"gte": "2026-02-01",
"lte": "2026-02-12",
"format": "yyyy-MM-dd"
}
}
}
}
Wildcard and Prefix Queries
# Wildcard: * matches any characters, ? matches one
GET /products/_search
{
"query": {
"wildcard": { "name": "key*" }
}
}
# Prefix: faster than wildcard for starts-with
GET /products/_search
{
"query": {
"prefix": { "name.keyword": "Wire" }
}
}
Pagination and Sorting
# Paginate results
GET /products/_search
{
"query": { "match_all": {} },
"from": 0,
"size": 20,
"sort": [
{ "price": "asc" },
{ "_score": "desc" }
],
"_source": ["name", "price", "category"]
}
6. Aggregations
Aggregations compute analytics over your data — counts, averages, histograms, and nested breakdowns.
Terms Aggregation (Group By)
# Count products per category
GET /products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": { "field": "category", "size": 20 }
}
}
}
# Returns: { "electronics": 150, "accessories": 89, ... }
Metric Aggregations
# Average, min, max, sum
GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"max_price": { "max": { "field": "price" } },
"total_revenue": { "sum": { "field": "price" } },
"price_stats": { "stats": { "field": "price" } }
}
}
Date Histogram
# Orders per day over the last month
GET /orders/_search
{
"size": 0,
"query": {
"range": { "created_at": { "gte": "now-30d" } }
},
"aggs": {
"orders_per_day": {
"date_histogram": {
"field": "created_at",
"calendar_interval": "day"
},
"aggs": {
"daily_revenue": { "sum": { "field": "total" } }
}
}
}
}
Nested Aggregations
# Average price per category with top products
GET /products/_search
{
"size": 0,
"aggs": {
"by_category": {
"terms": { "field": "category", "size": 10 },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
}
7. Mappings and Analyzers
Mappings define field types. Getting mappings right is critical — you cannot change a field's type after the index is created without reindexing.
Common Field Types
PUT /products
{
"mappings": {
"properties": {
"name": { "type": "text", "analyzer": "standard" },
"category": { "type": "keyword" },
"description": { "type": "text" },
"price": { "type": "float" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date", "format": "yyyy-MM-dd" },
"tags": { "type": "keyword" },
"location": { "type": "geo_point" }
}
}
}
# text: analyzed for full-text search (tokenized, lowercased, stemmed)
# keyword: exact values for filtering, sorting, aggregations
# Use both when you need search AND filtering on the same field
Multi-Field Mapping
# Map a field as both text and keyword
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
}
# Search: match on "name" (analyzed)
# Sort/aggregate: use "name.keyword" (exact)
Custom Analyzers
PUT /articles
{
"settings": {
"analysis": {
"analyzer": {
"custom_english": {
"type": "custom", "tokenizer": "standard",
"filter": ["lowercase", "english_stop", "english_stemmer"]
}
},
"filter": {
"english_stop": { "type": "stop", "stopwords": "_english_" },
"english_stemmer": { "type": "stemmer", "language": "english" }
}
}
},
"mappings": {
"properties": {
"title": { "type": "text", "analyzer": "custom_english" },
"body": { "type": "text", "analyzer": "custom_english" }
}
}
}
# Test your analyzer
POST /articles/_analyze
{ "analyzer": "custom_english", "text": "The quick brown foxes are jumping" }
# Tokens: ["quick", "brown", "fox", "jump"]
8. Full-Text Search Best Practices
- Use
textfor searchable fields,keywordfor exact match — searching on keyword fields requires exact case-sensitive matches, which is rarely what users expect. - Boost important fields — use
"fields": ["title^3", "body"]in multi_match to weight title matches higher. - Use
filtercontext for non-scoring conditions — filters are cached and skip scoring, making them significantly faster thanmust. - Choose the right analyzer — the standard analyzer works for most cases. Use language-specific analyzers for stemming (e.g., "english" turns "running" into "run").
- Add synonyms — use a synonym filter so "laptop", "notebook", and "portable computer" all match.
- Use
match_phrasefor exact phrase search — "quick brown fox" matches only when those words appear together in that order. - Implement autocomplete with
edge_ngram— tokenize "elasticsearch" into "e", "el", "ela", ... for prefix-based suggestions. - Set
index: falseon fields you never search — saves disk space and speeds up indexing for fields used only for display.
9. Index Templates and Lifecycle Management
Index Templates
# Create a template for all log indexes
PUT /_index_template/logs_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.lifecycle.name": "logs_policy"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text" },
"level": { "type": "keyword" },
"service": { "type": "keyword" }
}
}
},
"priority": 100
}
# Any new index matching "logs-*" inherits these settings
Index Lifecycle Management (ILM)
# Define a lifecycle policy: hot -> warm -> cold -> delete
PUT /_ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_size": "50gb", "max_age": "7d" } } },
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } },
"cold": { "min_age": "30d", "actions": { "freeze": {} } },
"delete": { "min_age": "90d", "actions": { "delete": {} } }
}
}
}
Reindexing
# Reindex data from one index to another (useful for mapping changes)
POST /_reindex
{
"source": { "index": "products_v1" },
"dest": { "index": "products_v2" }
}
# Use aliases for zero-downtime reindexing
POST /_aliases
{
"actions": [
{ "remove": { "index": "products_v1", "alias": "products" } },
{ "add": { "index": "products_v2", "alias": "products" } }
]
}
10. Performance Tuning
Bulk Operations
# Bulk API: index, update, or delete many documents in one request
POST /_bulk
{"index": {"_index": "products", "_id": "1"}}
{"name": "Keyboard", "price": 49.99, "category": "electronics"}
{"index": {"_index": "products", "_id": "2"}}
{"name": "Mouse", "price": 29.99, "category": "electronics"}
{"delete": {"_index": "products", "_id": "3"}}
# Always use bulk for batch operations
# Optimal batch size: 5-15 MB per request, or 1000-5000 documents
Refresh Interval
# Documents are not searchable until a refresh (default: 1 second)
# For heavy indexing, increase the interval
PUT /products/_settings
{
"index.refresh_interval": "30s"
}
# Disable refresh during bulk loading, re-enable after
PUT /products/_settings
{ "index.refresh_interval": "-1" }
# ... bulk index millions of documents ...
PUT /products/_settings
{ "index.refresh_interval": "1s" }
POST /products/_refresh
JVM Heap and System Settings
# Set heap to 50% of available RAM, max 31 GB
# In jvm.options or ES_JAVA_OPTS:
-Xms16g
-Xmx16g
# Always set Xms and Xmx to the same value (avoid resizing)
# System settings for production (in /etc/sysctl.conf):
vm.max_map_count=262144
vm.swappiness=1
# File descriptor limit (in /etc/security/limits.conf):
elasticsearch - nofile 65535
Performance Checklist
- Use bulk API for batch indexing — individual document writes are 10-100x slower
- Use
filtercontext in bool queries — filters are cached and skip scoring - Avoid wildcard queries with leading wildcards —
*boardscans every term in the index - Use
keywordtype for sorting and aggregations — sorting ontextfields requires fielddata (very memory-intensive) - Right-size shards — aim for 10-50 GB per shard, avoid thousands of tiny shards
- Use SSD storage — Elasticsearch is I/O intensive; SSDs improve performance dramatically
- Force merge read-only indexes — merging segments improves query speed on indexes that are no longer written to
11. Elasticsearch with Kibana
Kibana is the official visualization platform for Elasticsearch. It provides a web UI for querying, building dashboards, and managing your cluster.
- Discover — explore and filter your data interactively. Search logs, inspect documents, and see field distributions.
- Dashboard — combine visualizations into interactive dashboards. Share with your team for monitoring and analysis.
- Dev Tools Console — write and test Elasticsearch queries directly in the browser with autocomplete and formatting.
- Index Management — view index health, manage ILM policies, and configure index templates.
- Alerting — set up rules to notify you when query conditions are met (e.g., error rate spikes above a threshold).
# Create a data view (formerly index pattern) in Kibana:
# 1. Go to Stack Management > Data Views
# 2. Create data view: "logs-*"
# 3. Set @timestamp as the time field
# 4. Go to Discover to explore your data
# Kibana also supports Canvas (pixel-perfect reports),
# Lens (drag-and-drop visualizations), and Maps (geospatial data)
12. Security
Elasticsearch 8.x enables security by default. Always configure authentication, TLS, and role-based access in production.
Authentication
# Reset the elastic superuser password
bin/elasticsearch-reset-password -u elastic
# Create a user with a specific role
POST /_security/user/app_reader
{
"password": "secure_password_here",
"roles": ["reader_role"],
"full_name": "Application Reader"
}
# Connect with credentials
curl -u elastic:your_password https://localhost:9200
TLS/SSL and Role-Based Access
# Generate certificates (built-in tool)
bin/elasticsearch-certutil ca
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
# elasticsearch.yml — enable transport and HTTP TLS
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: http.p12
# Create a read-only role
POST /_security/role/reader_role
{
"indices": [{
"names": ["products*", "logs-*"],
"privileges": ["read", "view_index_metadata"]
}]
}
# Create a writer role
POST /_security/role/writer_role
{
"indices": [{
"names": ["products"],
"privileges": ["write", "create_index", "read"]
}]
}
13. Common Pitfalls and Troubleshooting
- Mapping explosion — dynamic mapping creates a field for every new JSON key. Set
"dynamic": "strict"to reject unexpected fields, or"dynamic": "false"to ignore them. - Yellow cluster status — means replicas cannot be allocated. On a single-node cluster, set
number_of_replicas: 0. On multi-node, ensure you have enough nodes for replica placement. - Max shards per node exceeded — the default limit is 1000 shards per node. Delete old indexes, use ILM to manage lifecycle, and right-size your shard count.
- Slow queries on text fields — avoid sorting or aggregating on
textfields. Usekeywordsub-fields instead. Enablefielddataonly as a last resort. - Out of memory (OOM) — Elasticsearch heap should be max 50% of RAM, never more than 31 GB. Leave the rest for the OS file cache, which Lucene relies on heavily.
- Circuit breaker exceptions — queries requiring too much memory are rejected. Reduce aggregation cardinality, add filters, or increase the circuit breaker limit carefully.
# Check cluster health
GET /_cluster/health
# See unassigned shards
GET /_cat/shards?v&s=state&h=index,shard,state,unassigned.reason
# Check node resource usage
GET /_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,disk.used_percent
# Find slow queries in the slow log
PUT /products/_settings
{
"index.search.slowlog.threshold.query.warn": "5s",
"index.search.slowlog.threshold.query.info": "2s"
}
14. Elasticsearch vs Alternatives
- Apache Solr — also built on Lucene, similar full-text search capabilities. Solr has excellent XML/schema support and is mature. Elasticsearch wins on ease of setup, REST API design, real-time analytics, and the Kibana ecosystem. Choose Solr if you already have Solr expertise or need advanced XML handling.
- Meilisearch — a lightweight, fast search engine optimized for front-end search. Instant results, typo tolerance, and faceting out of the box. Ideal for small-to-medium datasets (under 10M documents) where developer experience matters. Not suited for log analytics or complex aggregations.
- Typesense — similar to Meilisearch: simple API, fast typo-tolerant search, easy to operate. Better hardware efficiency than Elasticsearch for simple search use cases. Lacks the aggregation depth and ecosystem of Elasticsearch.
- OpenSearch — an open-source fork of Elasticsearch 7.10, maintained by AWS. API-compatible with Elasticsearch. Choose OpenSearch if you want a fully open-source license (Apache 2.0) or run on AWS.
When to choose Elasticsearch: you need full-text search at scale, complex aggregations, log analytics, the ELK ecosystem, or geospatial queries. When to choose an alternative: you need a simple search box for a small dataset (Meilisearch/Typesense) or want a purely open-source license (OpenSearch).
Frequently Asked Questions
What is the difference between an Elasticsearch index and a database table?
An Elasticsearch index is roughly analogous to a database table, but stores JSON documents instead of rows, uses mappings instead of a fixed schema, and automatically indexes every field for full-text search. Indexes are distributed across shards for horizontal scaling, and documents do not need identical fields, giving you schema flexibility.
How many shards should I configure for an Elasticsearch index?
A good starting point is one shard per 10-50 GB of data. Each shard consumes memory and file descriptors, so too many small shards waste resources. For indexes under 10 GB, a single shard is usually sufficient. Keep total shards under 20 per GB of heap memory across the cluster.
When should I use Elasticsearch instead of a relational database?
Use Elasticsearch when you need full-text search with relevance scoring, fuzzy matching, or autocomplete. It excels at log analytics, searching millions of documents, real-time aggregations, and geospatial queries. Do not use it as a primary database for transactional data requiring ACID guarantees or complex joins. The most common pattern is running Elasticsearch alongside a relational database.
How does Elasticsearch handle full-text search differently from SQL LIKE?
SQL LIKE performs pattern matching on raw text and cannot use indexes efficiently. Elasticsearch uses inverted indexes: text is tokenized, lowercased, and stemmed during ingestion, so a search for "running" matches "run", "runs", and "running" automatically. Results are scored by relevance using BM25, making full-text search orders of magnitude faster and more useful than LIKE queries.
What is the ELK Stack and how do the components work together?
The ELK Stack consists of Elasticsearch (search and storage), Logstash (data ingestion and transformation), and Kibana (visualization and dashboards). Beats are lightweight data shippers that send logs, metrics, and network data to the stack. Together, they form a complete observability and search platform used by thousands of organizations for log management and analytics.
Conclusion
Elasticsearch is the most powerful search and analytics engine available today. Start with a single-node Docker setup, define explicit mappings for your indexes, and use bool queries with filter context for fast, relevant search. As your data grows, leverage index lifecycle management to automate data retention, bulk operations for efficient indexing, and the Kibana ecosystem for visualization.
For production, always enable security (TLS + authentication), right-size your shards (10-50 GB each), set JVM heap to half your RAM (max 31 GB), and use aliases with reindexing for zero-downtime schema changes.
Learn More
- JSON Formatter — format and validate Elasticsearch JSON queries and responses
- Docker Compose: The Complete Guide — deploy Elasticsearch and Kibana with a single YAML file
- PostgreSQL Complete Guide — relational database to pair with Elasticsearch as your source of truth
- Redis Complete Guide — caching layer to complement Elasticsearch for frequently accessed data