Skip to content

Widhian Bramantya

coding is an art form

Menu
  • About Me
Menu
elasticsearch

Maintaining Super Large Datasets in Elasticsearch

Posted on October 5, 2025October 5, 2025 by admin

Understanding “Hot Nodes”

A hot node is a node that carries more load than others —
for example, too many shards or very active shards.

Causes of Hot Nodes:

  • Uneven shard distribution (some nodes have bigger shards)
  • Skewed traffic (queries always hit certain indices)
  • Oversized shards (single shard too heavy)
  • Missing ILM or rollover policy

How to Prevent Hot Nodes:

  1. Check shard allocation: GET _cat/allocation?v
  2. Balance shards across nodes — Elasticsearch does this automatically, but you can adjust weights.
  3. Use ILM to roll over indices before they grow too big.
  4. Use more hot nodes if needed (scale horizontally).
  5. Avoid routing all writes to a single shard (use routing key wisely).

A balanced cluster keeps CPU, heap, and disk usage similar across all nodes.

Shrink Oversized Indices

If you already have a very large index with too many shards, you can shrink it.

  1. Make the index read-only: PUT my_index/_settings { "settings": { "index.blocks.write": true } }
  2. Shrink to fewer shards: POST my_index/_shrink/my_index_small { "settings": { "index.number_of_shards": 1 } }

Shrinking reduces memory and coordination overhead.

Force Merge Old Indices

When an index is no longer updated (like last month’s logs),
you can reduce segment count to improve read speed:

POST logs-2025-09/_forcemerge?max_num_segments=1

Use this only for read-only indices, because it is expensive for active data.

Monitor Shard and Node Health

Regular monitoring keeps problems small.

GET _cat/indices?v
GET _cat/shards?v
GET _cluster/health
  • Green: All primary and replica shards are active
  • Yellow: Replica shards missing
  • Red: Primary shard missing (critical)

Use Kibana Monitoring or ElasticHQ to see hot nodes and shard distribution visually.

Use Snapshots for Backup

Back up large datasets with snapshot and restore.
Snapshots are incremental, so they save only changed data.

PUT _snapshot/my_backup
{
  "type": "fs",
  "settings": { "location": "/mnt/es_backup" }
}

PUT _snapshot/my_backup/snapshot_2025_10
{ "indices": "my_index*" }

Always store snapshots outside the cluster to protect against node failure.

See also  Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades

Avoid Too Many Indices

Thousands of tiny indices can hurt performance just like oversized ones.
Each index has metadata that consumes memory.

Combine similar data into a single index with a category or source field.
Use index templates to ensure consistent settings:

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": { "number_of_shards": 3 },
    "mappings": { "properties": { "timestamp": { "type": "date" } } }
  }
}

Schedule Regular Maintenance

To keep the cluster clean and healthy:

  • Delete or roll over old indices
  • Rebalance shards periodically
  • Check heap usage and disk thresholds
  • Refresh nodes or restart slowly (one by one)
  • Test ILM policies and snapshot restore regularly

Example to delete old data:

DELETE logs-2024*

Summary of Trade-Offs and Balance

DecisionBenefitRisk
More shardsParallel queries, faster indexingMemory overhead, coordination cost
Fewer shardsLess overhead, simpler stateLonger recovery, possible “hot shard”
Too many small indicesEasy to isolate dataMetadata overload
Too large indexSimple managementSlow queries and snapshots

The goal is balance, not too many, not too few.

Conclusion

Maintaining super large Elasticsearch datasets is about balance and planning.
Use ILM to automate data flow, monitor shard sizes, distribute load evenly, and always keep an eye on node health. Avoid both shard explosion and giant shards, find your sweet spot.

“Good clusters are like good teams, evenly balanced, not too hot, not too cold.”

If you follow these habits, your Elasticsearch cluster will stay fast, stable, and scalable, no matter how big your data grows.

Pages: 1 2
Category: ElasticSearch

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Linkedin

Widhian Bramantya

Recent Posts

  • Log Management at Scale: Integrating Elasticsearch with Beats, Logstash, and Kibana
  • Index Lifecycle Management (ILM) in Elasticsearch: Automatic Data Control Made Simple
  • Blue-Green Deployment in Elasticsearch: Safe Reindexing and Zero-Downtime Upgrades
  • Maintaining Super Large Datasets in Elasticsearch
  • Elasticsearch Best Practices for Beginners
  • Implementing the Outbox Pattern with Debezium
  • Production-Grade Debezium Connector with Kafka (Postgres Outbox Example – E-Commerce Orders)
  • Connecting Debezium with Kafka for Real-Time Streaming
  • Debezium Architecture – How It Works and Core Components
  • What is Debezium? – An Introduction to Change Data Capture
  • Offset Management and Consumer Groups in Kafka
  • Partitions, Replication, and Fault Tolerance in Kafka
  • Delivery Semantics in Kafka: At Most Once, At Least Once, Exactly Once
  • Producers and Consumers: How Data Flows in Kafka
  • Kafka Architecture Explained: Brokers, Topics, Partitions, and Offsets
  • Getting Started with Apache Kafka: Core Concepts and Use Cases
  • Security Best Practices for RabbitMQ in Production
  • Understanding RabbitMQ Virtual Hosts (vhosts) and Their Uses
  • RabbitMQ Performance Tuning: Optimizing Throughput and Latency
  • High Availability in RabbitMQ: Clustering and Mirrored Queues Explained

Recent Comments

  1. Playing with VPC AWS (Part 2) – Widhian's Blog on Playing with VPC AWS (Part 1): VPC, Subnet, Internet Gateway, Route Table, NAT, and Security Group
  2. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 1): Introduction
  3. Basic Concept of ElasticSearch (Part 2): Architectural Perspective – Widhian's Blog on Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh
  4. Basic Concept of ElasticSearch (Part 3): Translog, Flush, and Refresh – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective
  5. Basic Concept of ElasticSearch (Part 1): Introduction – Widhian's Blog on Basic Concept of ElasticSearch (Part 2): Architectural Perspective

Archives

  • October 2025
  • September 2025
  • August 2025
  • November 2021
  • October 2021
  • August 2021
  • July 2021
  • June 2021
  • March 2021
  • January 2021

Categories

  • Debezium
  • Devops
  • ElasticSearch
  • Golang
  • Kafka
  • Lua
  • NATS
  • Programming
  • RabbitMQ
  • Redis
  • VPC
© 2025 Widhian Bramantya | Powered by Minimalist Blog WordPress Theme