Elasticsearch can handle millions or even billions of documents. It is fast and scalable, but only if you manage it correctly. When your data grows very large, bad shard planning or poor data balance can make the cluster slow or unstable.
This article explains how to maintain very large datasets in Elasticsearch, including the trade-offs between many vs. few shards, Index Lifecycle Management (ILM), and how to prevent hot nodes.
The Challenge of Large Data
Elasticsearch splits every index into smaller parts called shards. Each shard is like a small Lucene database that stores part of your documents.
When your data grows:
- Queries become slower
- Nodes use too much memory and CPU
- Backups take longer
- Cluster recovery after restart can be very slow
So, shard sizing and balancing are critical for performance and stability.
Ideal Shard Size and Trade-Offs
There is no perfect shard size, it depends on your data type and query pattern, but most clusters work well when each shard is around 10–50 GB.
Type of Data | Recommended Shard Size |
---|---|
Logs / time-based data | 10–50 GB |
Analytics / numeric data | 50–100 GB |
Text-heavy search | 10–30 GB |
Trade-Off: Many Small Shards vs. Few Large Shards
Case | Pros | Cons |
---|---|---|
Many small shards | Better parallelism; faster recovery of small pieces | High memory overhead; cluster state becomes large; GC pressure; slow searches due to coordination |
Few large shards | Less overhead, simpler cluster state | Long recovery times; single shard can become a bottleneck; risk of “hot shard” |
Each shard consumes heap memory, even if it is small.
So, thousands of small shards can waste resources.
On the other hand, very large shards (for example, >100 GB) can make merges and queries slow.
Rule of thumb:
Keep each shard between 20 GB and 50 GB, and total shard count per node under a few thousand (ideally < 2000).
You can check shard sizes with:
GET _cat/shards?v
Use Index Lifecycle Management (ILM)
ILM automatically manages the size, age, and location of your data. It helps prevent oversized indices and automates clean-up.
Example Policy
PUT _ilm/policy/logs_policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "30gb", "max_age": "7d" } } }, "warm": { "min_age": "30d", "actions": { "allocate": { "include": { "box_type": "warm" } }, "forcemerge": { "max_num_segments": 1 } } }, "delete": { "min_age": "90d", "actions": { "delete": {} } } } } }
This keeps your indices small, moves old data to cheaper nodes, and deletes data after 90 days.
Hot–Warm–Cold Architecture
Not all data is equal.
Recent data is queried all the time (hot), while old data is rarely touched (cold).
You can separate them by node tiers.
Tier | Role | Hardware |
---|---|---|
Hot | Active data, frequent queries and indexing | Fast CPU, SSD |
Warm | Older data, less indexing | Medium CPU, HDD |
Cold | Rarely accessed archives | Large but slow storage |
Frozen | Archived snapshots | Low-cost object storage |
ILM can automatically move indices between these tiers.
"allocate": { "include": { "box_type": "warm" } }