System Design Deep-dive

Region: ap-southeast-1 (Singapore) — Best for Vietnam latency. VPC Strategy: Multi-AZ (2 Availability Zones) for High Availability.

For each of the components, we will choose the appropriate AWS service to implement.

Component in DiagramRecommended AWS ServiceJustification & Role in Architecture
Route 53Amazon Route 53Acts as the DNS service. It will resolve our public domain name to the Edge Load Balancer’s IP address. It can also provide private DNS for internal service discovery.
Edge Load BalancerApplication Load Balancer (ALB)This is the main entry point for all user traffic. As a Layer 7 load balancer, it can inspect HTTP requests and route them to the API Gateway. It also provides SSL/TLS termination and integrates with AWS WAF for security.
API GatewayAmazon API GatewayServes as the front door for the microservices. It manages API keys, request validation, throttling, and authorization, and routes requests to the appropriate backend services via the internal Service Load Balancer.
Service Load BalancerApplication Load Balancer (Internal)This internal-facing ALB distributes traffic from the API Gateway to various microservices. It provides a stable endpoint for each service, allowing them to scale independently.
‘Digital Hail’ Service, Metadata Service, Real-time Processing & ETA ServiceAWS Fargate on Amazon ECS or AWS LambdaThis is the compute layer. AWS Fargate (serverless containers) is an excellent choice for services that need to run continuously or require more processing time. AWS Lambda is perfect for event-driven, short-running tasks, like the “Real-time Processing & ETA Service” which is triggered by SQS messages.
Notification ServiceAmazon Simple Notification Service (SNS)SNS is a messaging service perfect for fanning out notifications. It can trigger alerts to passengers (via push notifications), drivers (via an internal app), and administrators (via email/SMS).
The service (MQTT Ingestion)AWS IoT CoreThis service provides a fully managed MQTT broker, which is the standard protocol for IoT devices.
Message Queue (SQS)Amazon Simple Queue Service (SQS)A fully managed message queue that decouples the data ingestion from the processing services. This ensures that no data is lost during traffic spikes and improves the overall resilience of the system.
Time-Series DatabaseAmazon TimestreamA serverless time-series database optimized for storing and analyzing data points over time, which is exactly what we need for high-frequency GPS location data.
Caching Layer (Redis)Amazon ElastiCache for RedisA managed in-memory data store. It provides low-latency access to frequently requested data like ETAs and bus locations, reducing the load on the databases and improving app performance.
Relational DatabaseAmazon RDS for PostgreSQLA managed relational database service that handles backups, patching, and scaling. Using its Multi-AZ deployment option will ensure high availability for the critical metadata.

Cost Estimations

Case 1:

Your application offers a free account and a premium account that guarantees faster processing. You need to ensure that the premium users who paid for the service have higher priority than your free members. How do you design your architecture to address this requirement?

API Gateway Usage Plans & Throttling

We will use Amazon API Gateway capabilities to tier the users.

  • API Keys: Assign different API keys to “Free” users and “Premium” users.
  • Usage Plans:
    • Free Plan: Set a strict rate limit (e.g., 5 requests/second) and a low burst limit. If they exceed this, they get a 429 Too Many Requests error.
    • Premium Plan: Set a high rate limit (e.g., 50 requests/second) and high burst limit to ensure their requests always get through during congestion.

The “Priority Queue” Pattern

  1. Create Two SQS Queues:
    • Standard Queue
    • Premium Priority Queue
  2. Routing: The Ingestion Service (or API Gateway) checks the user’s tier.
    • If Premium Send message to Premium Priority Queue.
    • If Free Send message to Standard Queue.
  3. The Consumer (Real-time Processing Service):
    • Configure the worker service (ECS) to poll the Premium Priority Queue first.
    • Only when the Premium queue is empty does it process messages from the Standard Queue.