In this example, Neeraj Gupta answers the system design question, "Design a distributed message queue."
Message queues are a common and widely used component in distributed systems. They help different parts of a system talk to one another, even if those parts don't operate at the same time.
Messages are stored in a queue until they are used by a part of the system that needs them. This can help make sure that information doesn't get lost.
Message queues are like a temporary storage room for information. They help different parts of a system exchange information with one another.
Message queues are part of a distributed system with many servers. These servers are also called brokers. Brokers form a cluster that works to keep the whole system reliable.
Message queues can help different parts of a system work more independently and efficiently.
Message queues can be used in many situations, like:
Some popular message queues include RabbitMQ, Apache Kafka, Apache ActiveMQ, Google Pub/Sub, AWS SNS/SQS, and Azure Queue.
Learn more about using a system design interview framework to answer questions like these.
A queue's primary function is to insert and remove messages, also known as producing and consuming messages.
However, it's essential to consider the system's scalability and other non-functional requirements.
Some types of queues include:
For scalability, it's best to consider the pull model for consumers, where they pull messages from the queue instead of producers pushing.
To improve scalability, adding more servers or batching operations can be helpful.
The message structure should include a topic, payload, and key for partitioning purposes.
The message queue itself should be highly scalable and able to handle abrupt spikes in traffic.
Topic-based queues provide the flexibility of having up to 10k topics with an estimated 10 million messages daily, requiring 800 GB of storage per day with a 30-day retention period.
Storage is a crucial component in designing a message queue system. You can use SQL, NoSQL, or Write Ahead Log (WAL) when dealing with a large volume of messages and a read-and-write-heavy system.
The Write Ahead Log approach is advised because it is an append-only log system.
Each message is added to the end of the file. However, appending to a single file can cause the file to become too large.
To handle this situation, divide the file into multiple segments and split it based on buyer ID.
Each segment can be stored on different servers to support scalability.
Metadata storage and state storage are also necessary.
Metadata storage contains configuration information and state storage contains information about where each consumer last read.
Fault tolerance is the ability of a system to continue operating even in the presence of a system fault.
A leader-follower approach is suggested with coordinated service and a zookeeper to store and interact with leaders and followers.
Different approaches to successfully writing messages, including acknowledgments and replicating to followers, are also possible.
Implementing such an approach can help ensure fault tolerance and the successful writing of messages.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account