CityPulse: Real-Time Traffic Data Analytics and Congestion Prediction
Idriss Djiofack Teledjieu, Irzum Shafique

TL;DR
CityPulse is a scalable, containerized big data pipeline that enables real-time traffic analytics and congestion prediction without physical sensors, demonstrating high throughput and robustness for urban mobility insights.
Contribution
It introduces a novel, containerized architecture for real-time traffic data processing that combines scalable ingestion, buffering, and machine learning for congestion prediction.
Findings
Handles over 300,000 records per minute with minimal latency increase
Demonstrates robustness and fault tolerance under full load
Provides a cost-effective, reproducible solution for resource-constrained environments
Abstract
CityPulse is a proof-of-concept big data pipeline designed to enable real-time urban mobility analytics using scalable, containerized components -- without reliance on physical sensor infrastructure. The system simulates the ingestion of 11 million traffic-related records representing urban phenomena such as vehicle congestion, GPS coordinates, and weather conditions. Data is ingested through a Dockerized Apache Kafka cluster, coordinated by ZooKeeper, and processed in real time using Apache Spark Structured Streaming. To ensure robustness under load, the architecture introduces a temporary data storage layer that buffers Spark output before committing it to a centralized data warehouse. This design improves write efficiency, fault tolerance, and enables batch processing of intermediate results. The refined data feeds into a lightweight machine learning module and is served through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques
