Week 1

This document outlines the project’s core components:

The customer definition.
Functional and non-functional requirements.
Detailed capacity estimation to guide the system’s architecture.

1. Customer and Business Definition

Based on the initial analysis, we can define the customer and the core business problem the project aims to solve.

Customer: The primary customer is a Government-type entity focused on Smart Cities and Public Utilities. This entity is responsible for managing and improving the efficiency, reliability, and appeal of public transportation in a major urban center, such as Da Nang or Ho Chi Minh City.
Business Problem: The public bus system is caught in a cycle of inefficiency. Passengers face unpredictable wait times and long journeys due to traffic and rigid schedules, making the bus an unreliable option for daily commutes. Simultaneously, bus operators are forced to service every stop regardless of demand, leading to wasted fuel, increased operational costs, and wear and tear on vehicles, especially during off-peak hours. This inefficiency further contributes to the poor service that deters ridership.

2. Requirements and Scope

The system is designed to bridge the information gap between passenger demand and bus operations.

Functional Requirements

The system will provide the following key functions:

Real-time Bus Tracking: Passengers can view the live location of buses on a map via a mobile application.
Dynamic ETA Calculation: The system will provide passengers with a continuously updated Estimated Time of Arrival (ETA) that accounts for traffic and the bus’s progress.
Demand-Responsive Service (Digital Hail): Passengers can signal their intent to board a bus from a specific stop using the mobile app or a physical button at the bus stop.
In-Cabin Decision Support: Bus drivers receive real-time notifications on a simple in-cabin interface, indicating which upcoming stops have waiting passengers, allowing them to safely bypass empty ones.
Alternative Route Suggestions: In case of significant delays, the system will proactively suggest alternative routes or connections to passengers.
Administrative Dashboard: A web-based portal for operators to monitor system health, manage routes, and analyze operational data.

Non-Functional Requirements

For the system (if needed)

Category	Requirement	Priority	Justification
Performance	ETA should be accurate to within ±1 minute 95% of the time.	High	Core to building passenger trust and system reliability .
	Location and status updates must be reflected in the app in under 5 seconds.	High	Ensures a “real-time” user experience .
Reliability	The system must maintain >98% uptime during operational hours.	High	Essential for a public utility that users depend on daily .
	The system must automatically recover from network disruptions within 30 seconds.	Medium	Ensures service continuity and data integrity .
Connectivity	Stable 4G/LTE connectivity must be maintained in at least 95% of the operational area.	High	The backbone of the real-time data exchange .
Durability	Physical buttons at bus stops must be weather-resistant (IP56 rated) and withstand at least 300,000 presses.	High	Ensures long-term reliability of hardware in public spaces .
Safety & Compliance	All hardware must meet electrical safety standards (IEC 60950-1) and transportation regulations (ISO 26262).	High	Non-negotiable for public deployment and user safety .
Scalability	The architecture must support scaling to 500+ buses without a significant drop in performance.	Medium	Prepares the system for future expansion to a city-wide scale .

For the actual AWS Architecture (Cloud - Centric)

Pillar: Performance Efficiency

Focuses on using resources efficiently to deliver the best performance.

Requirement	Metric	Linked Capacity Assumption & Justification
API Read Latency	The `GET /bus/eta` API endpoint must maintain a 99th percentile (p99) latency of < 300ms.	With 5,000 concurrent users all potentially requesting ETAs, this stringent latency target ensures the application feels responsive even under peak load.
Data Ingestion Throughput	The system must sustain ingestion of 15 location updates per second (1 update/10s for each of 150 buses) with an end-to-end processing latency of < 2 seconds.	This requirement is directly derived from the number of IoT devices (buses) and their update frequency. It ensures the data backend can handle the constant stream without backlogs .
Elasticity for Peak Events	The architecture must automatically scale to handle 8,000 concurrent users—a 60% increase over the normal peak load of 5,000.	This provides a realistic buffer for common urban events (e.g., festivals, football matches) that cause predictable traffic spikes, rather than an arbitrary 3x multiplier. This is a form of dynamic scaling .

Pillar: Reliability

Ensures the system can recover from failures and meet commitments.

Requirement	Metric	Linked Capacity Assumption & Justification
Service Availability	Achieve 99.9% (“three nines”) monthly uptime during the 16 daily operating hours.	Critical for a public utility serving 20,000 daily users. This availability target ensures the service is dependable for daily commuters.
Fault Tolerance	The system must be deployed across a minimum of two Availability Zones (AZs), with automated failover. The failure of one AZ must not impact service availability .	This is a foundational best practice to ensure the workload can withstand the failure of a single data center and continue serving its thousands of users .
Disaster Recovery	RTO = 15 minutes / RPO = 5 minutes.	For a system processing constant real-time updates from 150 buses, this ensures that in a disaster, the service is restored quickly with minimal data loss.

Pillar: Security

Focuses on protecting information and systems.

Requirement	Metric	Linked Capacity Assumption & Justification
User Data Protection	100% of data for the 20,000+ users (and future growth) must be encrypted at rest (AES-256) and in transit (TLS 1.2+).	Essential for protecting Personally Identifiable Information (PII) and building user trust at scale.
Device Authentication	The 150 bus devices and 500 bus stop devices must authenticate using unique, revocable X.509 certificates.	Prevents unauthorized devices from injecting malicious data into the system, a critical threat vector in any IoT network.
Network Isolation	Application and database tiers must be located in private subnets, isolated from direct public internet access.	This fundamental security posture reduces the attack surface for a system that is, by its nature, publicly exposed via its IoT endpoints and user application.

Pillar: Cost Optimization

Focuses on avoiding unnecessary costs.

Requirement	Metric	Linked Capacity Assumption & Justification
Off-Peak Scaling	Compute and database resources must automatically scale down during off-peak hours (e.g., 10 PM to 6 AM), reducing costs by at least 50% compared to peak-hour expenditure.	Our model assumes significantly lower traffic outside the 16 operating hours. This NFR ensures the architecture leverages that pattern to optimize cost.
Data Lifecycle Management	Raw IoT data (approx. 215 MB/day) must be automatically transitioned from hot storage (e.g., S3 Standard) to archival storage (e.g., S3 Glacier) after 90 days.	Balances the need for recent data for analysis against the high cost of storing terabytes of historical data long-term.

Pillar: Operational Excellence

Focuses on running and monitoring systems to deliver business value.

Requirement	Metric	Linked Capacity Assumption & Justification
Centralized Monitoring	All 150 buses and 500 bus stops, plus all backend services, must send metrics and logs to a centralized system (e.g., Amazon CloudWatch) .	With hundreds of distributed components, centralized monitoring is the only feasible way to get a holistic view of system health and troubleshoot issues effectively .
Automated Alerting	Automated alarms must trigger when key performance indicators (e.g., API p99 latency > 300ms, ingestion failure rate > 1%) are breached for more than 5 minutes .	For a system of this scale, manual monitoring is impractical. Automated alerts enable a small operations team to manage a large, distributed infrastructure proactively .

Pillar: Sustainability (Optional)

Focuses on minimizing the environmental impacts of running cloud workloads.

Requirement	Metric	Linked Capacity Assumption & Justification
Workload Scheduling	Non-critical data processing tasks (e.g., generating weekly analytics reports) must be scheduled to run during periods of low energy grid carbon intensity.	Even though the user base is regional, the cloud resources are not. This ensures that the environmental impact of the backend processing is minimized.
Efficient Hardware Selection	When selecting instance types, prioritize ARM-based AWS Graviton processors where workloads are compatible.	Graviton processors offer better performance per watt, reducing the overall energy consumption required to serve the 20,000 daily users.

Out of Scope

To ensure focus and feasibility, the following features are explicitly out of scope for the initial version:

Fare Collection and Payment Processing: The system will not handle ticketing or payments.
Real-time Traffic Light Integration: While a future goal, the initial system will not directly interface with municipal traffic control systems.
B2B Fleet Management Features: The system is designed for public use and will not include features for managing private fleets, such as pre-registered passenger lists.

3. Capacity Estimation

This section outlines the estimated load and data requirements for an initial deployment in a medium-sized city like Da Nang, which has approximately 15 major bus routes.

Assumptions

Note

We will choose a fairly large numbers for every aspects of the system usage in order to construct a comprehensive architecture.

Number of Buses: 150 buses operating across all routes.
Bus Stops: 500 bus stops equipped with hailing buttons.
Operating Hours: 16 hours per day (6 AM to 10 PM).
Peak Hours: 4 hours per day (7-9 AM and 5-7 PM).
Active Users (Passengers): 20,000 daily active users (DAU).
Peak Concurrent Users: 5,000 users during peak hours.

Estimation Tables (temporary)

IoT Data Generation

Data Source	Frequency	Data per Message	Daily Data per Unit	Total Daily Data
Bus GPS Location	1 update / 10 sec	256 bytes	1.4 MB	210 MB
Bus Stop “Hail”	50 hails / day (avg)	128 bytes	6.4 KB	3.2 MB

API Traffic Estimation

API Endpoint	Calls per Second (Peak)	Calls per Second (Off-Peak)
`POST /bus/location` (from buses)	15 calls/sec	15 calls/sec
`POST /stop/hail` (from stops)	~5 calls/sec	~1 call/sec
`GET /bus/eta` (from apps)	500 calls/sec	50 calls/sec
Total	~520 calls/sec	~66 calls/sec

Storage Requirements

Data Type	Daily Volume	Retention Policy	Estimated Yearly Storage
Raw GPS & Hail Data	~215 MB	90 days	~75 GB (Hot Storage)
Aggregated Daily Metrics	~20 MB	3 years	~22 GB (Cold Storage)
User & Route Data	Negligible	Indefinite	~5 GB
Total (First Year)			~102 GB

Bandwidth Requirements

Traffic Direction	Peak Rate	Sustained Rate
Ingress (from IoT devices)	~5 Mbps	~5 Mbps
Egress (to user apps)	~150 Mbps	~15 Mbps

Quartz 4

Explorer

Week 1

1. Customer and Business Definition

2. Requirements and Scope

Functional Requirements

Non-Functional Requirements

For the system (if needed)

For the actual AWS Architecture (Cloud - Centric)

Pillar: Performance Efficiency

Pillar: Reliability

Pillar: Security

Pillar: Cost Optimization

Pillar: Operational Excellence

Pillar: Sustainability (Optional)

Out of Scope

3. Capacity Estimation

Assumptions

Estimation Tables (temporary)

IoT Data Generation

API Traffic Estimation

Storage Requirements

Bandwidth Requirements

Graph View

Table of Contents