Introduction
System design is how you plan before you build. Not just code, but full picture: features, traffic, failures, scale and security. When you grow in level, company want you to think more, not just do more. Junior build, senior design before build. That’s why system design interview show how senior you think.
🔥 Motivation
Recently I’ve been in a system design interview where, despite having most of the technical knowledge covered like queues, load balancers, API gateways, workers it wasn’t the best experience due to lack of proper structure in answering system design questions. So, instead regretting the past, let’s ensure better future experiences!
How To Approach a System Design Interview
💡 Tip: Interviewer will definitely maintain an atmosphere of ambiguity and open ended scope, so instead of being confused, embrace it to show how well your thinking process is, and how you deal with problems as a true software engineer |
---|
Step 1: Understand the Problem
First thing, don’t rush. Listen careful to what the interviewer is asking for. Ask questions back if something is not clear. Make sure you 100% understand what you need to build before you start talking.
Step 2: Define & Split Requirements To Two Parts
- Functional Requirements: State what the system should do in clear separate bullet points.
- Non Functional Requirements: How well the system do what it should do (Scalability, Security, Fault-tolerance). Communicate them well, so the interviewer see you are thinking in full picture.
Step 3: Address Functional Requirements & High Level Design
- Draw the big pieces. Like: client, server, database, cache, queue, load balancer.
- Explain how they talk to each other. No need to go deep yet, just show that you know how to build a full system.
Step 4: Address Non Functional Requirements & Low Level Design
- Walk through each and every part of the system and state how it can potentially fail and how you are planning to address or prevent that specific component’s failure from failing the whole system.
- Pick the potential bottlenecks in the system and discuss how u can scale them, like database by sharding it, or the consumer worker by spinning more replicas or the message queue by configuring it to write jobs to disk so in case of failure or restart it resumes from where it left.
- Finally highlight potential security risks with the current system and address them.
💡 Tip: It would be really beneficial especially at big companies if you can do some calculations regarding memory, throughput, how many requests per second the system can handle and those stuff using techniques like Back of the envelope calculations |
---|
Practical Example: Design a Webhook Service
Step 1: Understand the Problem
Webhooks are way to get real-time updates from other systems. Unlike normal APIs where we keep asking: Any updates?… yooo anything new?… webhooks push data to us when something happens. For example, Stripe sends us webhook when payment done, Shopify when order ships, GitHub when code pushed.
Step 2: Define & Split Requirements
Functional Requirements
- Accept webhook events from external systems
- Process these events and do operations based on them
- Save original event data and processing results for tracking
Non-Functional Requirements
- High availability: system must work even when parts fail
- Security: Prevent webhook URL ID from being stolen, Rate limiting
- At-least-once processing: each event must be processed at least once
- Scalability: Can handle millions of requests per day
Step 3: Address Functional Requirements & High Level Design
Let’s start with basic design. When external system sends webhook event, our service needs to:
- Have API endpoint to receive event
- Process it
- Save it to database
Basic design looks like:
But this design has problem: if handler fails after processing but before saving to database, we lose the event.
Better design with message queue:
How this works:
- External system sends event to our API
- Request Handler accepts event and puts in message queue
- Request Handler sends 200 OK response
- Queue Consumer takes event from queue
- Queue Consumer processes event
- Queue Consumer saves result to database
- Once saved, event removed from queue
Step 4: Address Non-Functional Requirements & Low Level Design
Handling Failures
Request Handler Failures
- If handler fails before putting event in queue: External system won’t get 200 OK, so will retry
- Use timeouts and circuit breakers to prevent hanging
Message Queue Failures
- Use durable queues that save messages to disk
- Set up queue on multiple servers so if one fails, others work
Queue Consumer Failures
- Run multiple consumer instances
- Only remove message from queue after successful database save
- Auto-restart failed consumers
Database Failures
- Retry database writes with increasing delays
- Set up database replication
Security Concerns
-
HMAC Signatures:
- External system and our service share secret key
- External system sends request with HMAC hash of payload
- Our service calculates same hash and compares
- If match, we know request is real
-
IP Whitelisting:
- Only accept requests from known IP addresses
-
Rate Limiting:
- Limit requests from same client (like 50 per minute)
- Prevents attacks that send too many requests (Dos attacks)
Handling Duplicate and Out-of-Order Requests
For Duplicates
- Use idempotency keys: store unique ID for each event
- Before processing, check if already seen this ID
- Some message queues have built-in deduplication
For Out-of-Order Events
- Don’t assume events arrive in correct order
- Make processing logic handle any order
- Example: if “invoice.paid” arrives before “invoice.created”
- Get latest data from source API
- Use timestamps to know which event is newer
- Skip processing outdated events
By following these steps, we create webhook service that’s reliable, secure, and handles real-world problems like failures, duplicates, and out-of-order events.