This document outlines the project’s core components:

  • The customer definition.
  • Functional and non-functional requirements.
  • Detailed capacity estimation to guide the system’s architecture.

1. Customer and Business Definition

Based on the initial analysis, we can define the customer and the core business problem the project aims to solve.

  • Customer: The primary customer is a Government-type entity focused on Smart Cities and Public Utilities. This entity is responsible for managing and improving the efficiency, reliability, and appeal of public transportation in a major urban center, such as Da Nang or Ho Chi Minh City.
  • Business Problem: The public bus system is caught in a cycle of inefficiency. Passengers face unpredictable wait times and long journeys due to traffic and rigid schedules, making the bus an unreliable option for daily commutes. Simultaneously, bus operators are forced to service every stop regardless of demand, leading to wasted fuel, increased operational costs, and wear and tear on vehicles, especially during off-peak hours. This inefficiency further contributes to the poor service that deters ridership.

2. Requirements and Scope

The system is designed to bridge the information gap between passenger demand and bus operations.

Functional Requirements

The system will provide the following key functions:

  • Real-time Bus Tracking: Passengers can view the live location of buses on a map via a mobile application.
  • Dynamic ETA Calculation: The system will provide passengers with a continuously updated Estimated Time of Arrival (ETA) that accounts for traffic and the bus’s progress.
  • Demand-Responsive Service (Digital Hail): Passengers can signal their intent to board a bus from a specific stop using the mobile app or a physical button at the bus stop.
  • In-Cabin Decision Support: Bus drivers receive real-time notifications on a simple in-cabin interface, indicating which upcoming stops have waiting passengers, allowing them to safely bypass empty ones.
  • Alternative Route Suggestions: In case of significant delays, the system will proactively suggest alternative routes or connections to passengers.
  • Administrative Dashboard: A web-based portal for operators to monitor system health, manage routes, and analyze operational data.

Non-Functional Requirements

For the system (if needed)
CategoryRequirementPriorityJustification
PerformanceETA should be accurate to within ±1 minute 95% of the time.HighCore to building passenger trust and system reliability .
Location and status updates must be reflected in the app in under 5 seconds.HighEnsures a “real-time” user experience .
ReliabilityThe system must maintain >98% uptime during operational hours.HighEssential for a public utility that users depend on daily .
The system must automatically recover from network disruptions within 30 seconds.MediumEnsures service continuity and data integrity .
ConnectivityStable 4G/LTE connectivity must be maintained in at least 95% of the operational area.HighThe backbone of the real-time data exchange .
DurabilityPhysical buttons at bus stops must be weather-resistant (IP56 rated) and withstand at least 300,000 presses.HighEnsures long-term reliability of hardware in public spaces .
Safety & ComplianceAll hardware must meet electrical safety standards (IEC 60950-1) and transportation regulations (ISO 26262).HighNon-negotiable for public deployment and user safety .
ScalabilityThe architecture must support scaling to 500+ buses without a significant drop in performance.MediumPrepares the system for future expansion to a city-wide scale .
For the actual AWS Architecture (Cloud - Centric)
Pillar: Performance Efficiency

Focuses on using resources efficiently to deliver the best performance.

RequirementMetricLinked Capacity Assumption & Justification
API Read LatencyThe GET /bus/eta API endpoint must maintain a 99th percentile (p99) latency of < 300ms.With 5,000 concurrent users all potentially requesting ETAs, this stringent latency target ensures the application feels responsive even under peak load.
Data Ingestion ThroughputThe system must sustain ingestion of 15 location updates per second (1 update/10s for each of 150 buses) with an end-to-end processing latency of < 2 seconds.This requirement is directly derived from the number of IoT devices (buses) and their update frequency. It ensures the data backend can handle the constant stream without backlogs .
Elasticity for Peak EventsThe architecture must automatically scale to handle 8,000 concurrent users—a 60% increase over the normal peak load of 5,000.This provides a realistic buffer for common urban events (e.g., festivals, football matches) that cause predictable traffic spikes, rather than an arbitrary 3x multiplier. This is a form of dynamic scaling .
Pillar: Reliability

Ensures the system can recover from failures and meet commitments.

RequirementMetricLinked Capacity Assumption & Justification
Service AvailabilityAchieve 99.9% (“three nines”) monthly uptime during the 16 daily operating hours.Critical for a public utility serving 20,000 daily users. This availability target ensures the service is dependable for daily commuters.
Fault ToleranceThe system must be deployed across a minimum of two Availability Zones (AZs), with automated failover. The failure of one AZ must not impact service availability .This is a foundational best practice to ensure the workload can withstand the failure of a single data center and continue serving its thousands of users .
Disaster RecoveryRTO = 15 minutes / RPO = 5 minutes.For a system processing constant real-time updates from 150 buses, this ensures that in a disaster, the service is restored quickly with minimal data loss.
Pillar: Security

Focuses on protecting information and systems.

RequirementMetricLinked Capacity Assumption & Justification
User Data Protection100% of data for the 20,000+ users (and future growth) must be encrypted at rest (AES-256) and in transit (TLS 1.2+).Essential for protecting Personally Identifiable Information (PII) and building user trust at scale.
Device AuthenticationThe 150 bus devices and 500 bus stop devices must authenticate using unique, revocable X.509 certificates.Prevents unauthorized devices from injecting malicious data into the system, a critical threat vector in any IoT network.
Network IsolationApplication and database tiers must be located in private subnets, isolated from direct public internet access.This fundamental security posture reduces the attack surface for a system that is, by its nature, publicly exposed via its IoT endpoints and user application.
Pillar: Cost Optimization

Focuses on avoiding unnecessary costs.

RequirementMetricLinked Capacity Assumption & Justification
Off-Peak ScalingCompute and database resources must automatically scale down during off-peak hours (e.g., 10 PM to 6 AM), reducing costs by at least 50% compared to peak-hour expenditure.Our model assumes significantly lower traffic outside the 16 operating hours. This NFR ensures the architecture leverages that pattern to optimize cost.
Data Lifecycle ManagementRaw IoT data (approx. 215 MB/day) must be automatically transitioned from hot storage (e.g., S3 Standard) to archival storage (e.g., S3 Glacier) after 90 days.Balances the need for recent data for analysis against the high cost of storing terabytes of historical data long-term.
Pillar: Operational Excellence

Focuses on running and monitoring systems to deliver business value.

RequirementMetricLinked Capacity Assumption & Justification
Centralized MonitoringAll 150 buses and 500 bus stops, plus all backend services, must send metrics and logs to a centralized system (e.g., Amazon CloudWatch) .With hundreds of distributed components, centralized monitoring is the only feasible way to get a holistic view of system health and troubleshoot issues effectively .
Automated AlertingAutomated alarms must trigger when key performance indicators (e.g., API p99 latency > 300ms, ingestion failure rate > 1%) are breached for more than 5 minutes .For a system of this scale, manual monitoring is impractical. Automated alerts enable a small operations team to manage a large, distributed infrastructure proactively .

Pillar: Sustainability (Optional)

Focuses on minimizing the environmental impacts of running cloud workloads.

RequirementMetricLinked Capacity Assumption & Justification
Workload SchedulingNon-critical data processing tasks (e.g., generating weekly analytics reports) must be scheduled to run during periods of low energy grid carbon intensity.Even though the user base is regional, the cloud resources are not. This ensures that the environmental impact of the backend processing is minimized.
Efficient Hardware SelectionWhen selecting instance types, prioritize ARM-based AWS Graviton processors where workloads are compatible.Graviton processors offer better performance per watt, reducing the overall energy consumption required to serve the 20,000 daily users.

Out of Scope

To ensure focus and feasibility, the following features are explicitly out of scope for the initial version:

  • Fare Collection and Payment Processing: The system will not handle ticketing or payments.
  • Real-time Traffic Light Integration: While a future goal, the initial system will not directly interface with municipal traffic control systems.
  • B2B Fleet Management Features: The system is designed for public use and will not include features for managing private fleets, such as pre-registered passenger lists.

3. Capacity Estimation

This section outlines the estimated load and data requirements for an initial deployment in a medium-sized city like Da Nang, which has approximately 15 major bus routes.

Assumptions

Note

We will choose a fairly large numbers for every aspects of the system usage in order to construct a comprehensive architecture.

  • Number of Buses: 150 buses operating across all routes.
  • Bus Stops: 500 bus stops equipped with hailing buttons.
  • Operating Hours: 16 hours per day (6 AM to 10 PM).
  • Peak Hours: 4 hours per day (7-9 AM and 5-7 PM).
  • Active Users (Passengers): 20,000 daily active users (DAU).
  • Peak Concurrent Users: 5,000 users during peak hours.

Estimation Tables (temporary)

IoT Data Generation
Data SourceFrequencyData per MessageDaily Data per UnitTotal Daily Data
Bus GPS Location1 update / 10 sec256 bytes1.4 MB210 MB
Bus Stop “Hail”50 hails / day (avg)128 bytes6.4 KB3.2 MB
API Traffic Estimation
API EndpointCalls per Second (Peak)Calls per Second (Off-Peak)
POST /bus/location (from buses)15 calls/sec15 calls/sec
POST /stop/hail (from stops)~5 calls/sec~1 call/sec
GET /bus/eta (from apps)500 calls/sec50 calls/sec
Total~520 calls/sec~66 calls/sec
Storage Requirements
Data TypeDaily VolumeRetention PolicyEstimated Yearly Storage
Raw GPS & Hail Data~215 MB90 days~75 GB (Hot Storage)
Aggregated Daily Metrics~20 MB3 years~22 GB (Cold Storage)
User & Route DataNegligibleIndefinite~5 GB
Total (First Year)~102 GB
Bandwidth Requirements
Traffic DirectionPeak RateSustained Rate
Ingress (from IoT devices)~5 Mbps~5 Mbps
Egress (to user apps)~150 Mbps~15 Mbps