Day 8: System Design(Message Queues and Event Driven) — Part 2
Modern distributed systems rely on message queues and event-driven architecture to handle large-scale workloads efficiently. While the basics focus on asynchronous processing, fault tolerance, and scalability, advanced topics dive into message consistency, exactly-once processing, distributed transactions, and real-time event streaming.
Message Queue Guarantees & Delivery Semantics
Different message queues provide different levels of message delivery guarantees, which are critical for data consistency and system reliability.
1. At-Most-Once Delivery
- The message is delivered once or not at all.
- If a failure occurs, the message might be lost.
- Use Case: Low-priority notifications where losing a message is acceptable.
- Example: Sending analytics events to a logging system (losing some logs is fine).
2. At-Least-Once Delivery
- The message is guaranteed to be delivered but may be received multiple times (duplicates).
- The consumer must handle idempotency (same message processed multiple times should not cause incorrect results).
- Use Case: Payment processing, inventory updates.
- Example: If an order confirmation message is retried, the system must prevent charging the customer twice.
3. Exactly-Once Delivery
- The message is guaranteed to be delivered only once, even if failures occur.
- Difficult to achieve in distributed systems because it requires deduplication, transaction support, and fault tolerance mechanisms.
- Use Case: Financial transactions, stock trading.
- Example: A bank transaction must be processed exactly once to avoid money being deducted twice.
- How to Achieve? Kafka’s Exactly-Once Semantics (EOS), Idempotent Consumers, Two-Phase Commit (2PC).
Message Ordering and Deduplication
1. Message Ordering Issues
- In distributed systems, messages may arrive out of order if multiple consumers process them in parallel.
- Example: A chat application may receive messages out of sequence if different servers process them independently.
- Kafka partitions guarantee order per partition.
- FIFO (First In, First Out) queues in AWS SQS ensure ordering.
2. Message Deduplication
- If a producer resends a message due to a failure, consumers may receive duplicates.
- Example: If an email confirmation is resent, the system should not send multiple confirmation emails.
- Use unique message IDs and store processed message IDs.
- Idempotent operations ensure that reprocessing the same message has no side effects.
Distributed Transactions in Message Queues
Handling transactions across multiple microservices using message queues is complex due to eventual consistency issues.
1. Two-Phase Commit (2PC)
- A coordinator ensures all services commit or rollback together.
- Example: In bank transfers, both the sender and receiver accounts must update simultaneously.
- Disadvantage: Slower and can cause distributed locking issues.
2. Saga Pattern (Chained Transactions)
- Instead of locking resources, a series of local transactions is executed with compensating actions if a failure occurs.
- Step 1: Deduct money from user account.
- Step 2: Reserve stock in inventory.
- Step 3: If payment fails, rollback stock reservation.
Advantage: Avoids locking, better for microservices.
Disadvantage: Complex compensating logic needed for rollback scenarios.
Event-Driven Architecture in Large-Scale Systems
1. Event Sourcing
- Instead of storing the current state, the system logs every event that changes data.
- Example: Instead of storing “Account Balance = $1000”, store all transactions:
- Deposit $500 → Balance $500
- Withdraw $200 → Balance $300
- Deposit $700 → Balance $1000
- Full audit trail (history of all changes).
- Ability to replay events for debugging or restoring past states.
- Used in: Blockchain, banking systems, CRMs.
2. Event Replay & Time Travel
- Systems like Kafka and Apache Pulsar allow reprocessing past events.
- Example: If a new recommendation algorithm is added to Netflix, old user interactions can be replayed to retrain the model.
3. Event Choreography vs. Orchestration
Choreography (Decentralized, No Central Coordinator)
- Services react independently when events occur.
- Example: A user places an order → the payment service and inventory service listen for this event and act accordingly.
- Pros: Loose coupling, scalable.
- Cons: Harder to track dependencies.
Orchestration (Centralized Controller)
- A single orchestrator service manages the entire flow.
- Example: In a ride-hailing app, the “Ride Management Service” coordinates booking, driver matching, and payments.
- Pros: Easier to monitor and debug.
- Cons: The orchestrator can become a bottleneck.
Streaming & Real-Time Event Processing
1. What is Event Streaming?
- Unlike traditional queues that store messages temporarily, event streaming platforms like Kafka, Apache Pulsar, and AWS Kinesis store messages persistently and allow real-time processing.
- Example: A stock trading platform needs real-time updates for price fluctuations.
2. Lambda Architecture (Batch + Streaming Processing)
- Batch Processing: Handles historical data (e.g., nightly reports).
- Stream Processing: Handles real-time data (e.g., live analytics).
- Example: Twitter’s real-time trending topics use stream processing, while historical tweet analytics use batch processing.
Future Trends in Messaging & Event-Driven Systems
1. Serverless Event-Driven Architectures
- AWS Lambda, Google Cloud Functions, and Azure Functions allow event-driven processing without managing infrastructure.
- Example: A serverless function automatically resizes images when a user uploads them.
2. Edge Computing & Event Streaming
- IoT devices generate massive event streams, requiring low-latency processing at the edge.
- Example: Self-driving cars process events locally instead of sending all data to the cloud.
3. AI-Powered Event Processing
- Machine learning models analyze real-time events to detect fraud, anomalies, and predictions.
- Example: AI detects credit card fraud in milliseconds by analyzing transaction patterns.
Understanding these advanced concepts is crucial for building resilient and high-performance distributed systems.
I’ll be posting daily to stay consistent in both my learning followed by daily pushups. Thank you!
Follow my journey:
Medium: https://ankittk.medium.com/
Instagram: https://www.instagram.com/ankitengram/