Rebalancing for Dynamic Scaling
The system implements automatic rebalancing when consumers join or leave:
- Consumer Join: Existing consumers voluntarily release partitions to accommodate new consumers
- Consumer Leave: Other consumers claim released partitions to maintain processing
- Fair Distribution: Partitions are redistributed evenly among available consumers
This approach ensures no message loss during scaling events while maintaining ordering guarantees.
Key Implementation Details
1. Message Deduplication
Using JetStream’s message ID feature prevents duplicate processing:
ack, err := js.Publish(pubCtx, subject, b, jetstream.WithMsgID(idKey))
2. Consumer Durability
Durable consumers ensure message delivery even when consumers restart:
cfg := &nats.ConsumerConfig{ Durable: durable, FilterSubject: filterSubject, AckPolicy: nats.AckExplicitPolicy, AckWait: 2 * time.Minute, MaxAckPending: 512, ReplayPolicy: nats.ReplayInstantPolicy, }
3. Lease Management
The lease store provides atomic operations for claiming, renewing, and releasing partitions:
// Try to claim a partition info, err := ls.TryClaim(pid, app, leaseTTL) // Renew an existing lease info, err := ls.Renew(pid, app, prevRev, leaseTTL) // Release a partition when shutting down err = ls.Release(pid, app)
Failover and Recovery Mechanisms
The system handles various failure scenarios gracefully:
- Graceful Shutdown: Consumers release partitions before terminating
- Forced Termination: Leases expire automatically, allowing other consumers to claim partitions
- Network Partitions: Heartbeat mechanisms detect disconnected consumers
- Process Failures: Expired leases enable fast failover to other consumers
Performance Considerations
This approach provides several benefits:
- Maintained Ordering: Messages with the same key are processed in order
- Scalability: Consumers can be added or removed dynamically
- Fault Tolerance: Automatic recovery from failures
- Resource Efficiency: Partitions are evenly distributed among consumers
However, there are trade-offs:
- Partition Count: Must be planned based on expected throughput
- Lease Overhead: Coordination requires additional storage and network operations
- Complexity: More complex than simple queue-based processing
Configuration Parameters
Key configuration parameters affect system behavior:
# Partitioning PARTITIONS=8 # Lease management LEASE_TTL_SEC=5 HEARTBEAT_SEC=2 # Consumer behavior FETCH_BATCH=10
These values balance between responsiveness and system overhead.
Monitoring and Observability
The system includes monitoring capabilities:
- Process Management: Track active consumers and their assigned partitions
- Lease Visualization: View current partition ownership
- Message Flow: Monitor stream messages and processing status
Conclusion
This implementation demonstrates how NATS JetStream can achieve Kafka-like message ordering through careful architectural design. By combining partitioned subjects, consistent hashing, lease-based coordination, and automatic rebalancing, we can build systems that maintain strict ordering guarantees while remaining scalable and fault-tolerant.
The key advantages of this approach include:
- Native NATS ecosystem integration
- Dynamic scaling capabilities
- Fast failover mechanisms
- Strong ordering guarantees within partitions
- No external dependencies beyond NATS
For applications requiring strict message ordering with the performance and simplicity of NATS, this pattern provides an excellent alternative to traditional message queue solutions. For reference, you can see the example directly here