In this system design mock interview, we asked Neeraj, an Engineering Manager at eBay, to describe the design process for Uber Eats, a food delivery app.
Many components are required in a system that delivers food:
The first step is to define our functional requirements.
The system should allow restaurants to easily add, update, or remove menu items and information.
On the customer side, users should be able to view, search, and filter restaurants based on distance, types of food, and estimated delivery time.
Non-functional requirements define a system's attributes. In our example, this would be:
Some non-functional requirements are more critical than others.
For example, restaurant updates or menu items don't need to happen immediately.
If a restaurant releases a new menu item, it’s fine if our system takes a few seconds to update.
High availability, on the other hand, is a top priority.
Security is also an important consideration.
We don’t want users to access or manipulate the information they don’t own. We could use identity access management
and logging mechanisms
to prevent unauthorized access.
Lastly, we must address latency. Important metrics include the time it takes to load a restaurant page or conduct a search.
Fast response times are desired, such as:
Operations such as uploading images or new menu items are less critical, and it’s ok if they have a slightly higher latency.
For a system design to be effective, we need to know how much data we’re working with.
Let’s assume our Uber Eats system adds 100 restaurants and 10,000 customers daily. Over 20 years, the system would have about ~1.73 million restaurants and ~73 million customers.
That’s a fair estimate for the amount of data our system must be able to support.
Lastly, let’s also estimate daily restaurant views and searches to be around 100 million per day.
Each component (such as a menu, restaurant, order, etc.) in our system has data fields. A data model lists those fields and describes how each component interacts with other components.
For example, a menu item would have fields such as:
A restaurant would have:
Customers would have fields such as:
A common task in location-specific applications is calculating delivery times between two locations. Here, we need to calculate delivery times between restaurants and customers.
One way to do this is with GeoHashing. This method divides the world map into small grids and expresses each location as a short alphanumeric string.
We would use a longer alphanumeric string to represent a smaller (more specific) geographic area. Conversely, a larger geographic area can be represented with a shorter string.
Using this method, we can group multiple restaurants in a small area, such as a shopping center, and treat them as a single data unit. The delivery time will be relatively constant whether it's from the first shop or the tenth. This simplifies our design.
We could use a relational database storing data with separate tables for customers, menu items, restaurants, etc. We could also create multiple read replicas to speed up data reading.
Databases typically receive more read requests than write requests. A read replica is a copy of a primary database used explicitly for processing read requests.
Since the number of users is high in our example (~173 million), a NoSQL database such as Cassandra may be more appropriate as it supports automatic sharding.
In designing our system, we could start by creating a layer that orchestrates services across different platforms, such as iOS, Android, mobile web, and desktop web. This will be useful for sharing back-end resources.
Next, we could implement a service for restaurants.
This would have a load balancer on top to distribute traffic across multiple machines.
When storing data, we must be careful with specific fields such as menu images. Sending images over the network and storing them in a database may be too resource-intensive, so it's better to use object storage like S3 for efficiency.
We could also have an image service with a moderation policy. This policy would use machine learning models to ensure all uploaded images are accurate and meet quality standards.
Many trained models for image classification already exist and can be leveraged. These models use a classification algorithm to determine whether an image is appropriate.
The model's performance can be determined by testing it on a sample data set and calculating the precision and recall values.
Imagine you go fishing. You use a wide net and catch 80 out of 100 fish in a lake. That’s 80% recall. But you also get 80 rocks in your net. You have 50% precision because your net has 80 rocks and 80 fish.
You could use a smaller net and target an area with many fish and no rocks, but you might only get 20 fish to get 0 rocks. That’s 20% recall and 100% precision.
If our image validation process is slow, one solution could be to optimize the API or add more computing resources, such as GPUs or multiple CPUs, to increase speed.
We can also optimize the speed of data retrieval by storing information in the cache using a mechanism like LRU (Least Recently Used) or compression.
Our systems will need a search service. The search service identifies the closest restaurants to a user based on location.
For distance-based searches, such as finding nearby restaurants, we can use Elasticsearch, which provides different geo-queries. A restaurant’s coordinates can be stored in Elasticsearch, and the user's coordinates can be used to run a geo-query to determine nearby restaurants.
We also need a way to filter restaurants by delivery time.
We can create a polygon map that defines the region a restaurant can deliver to within a specific period and store it in a database such as DynamoDB.
This allows users to view restaurants that can deliver to them within a specified time.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account