Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs
Abdul Rashid Mussah, Maged Shoman, Mark Amo-Boateng, Yaw Adu-Gyamfi

TL;DR
This paper presents a GPU-accelerated ETL pipeline for statewide connected vehicle data, significantly reducing processing time and enabling real-time insights for transportation management.
Contribution
It introduces a GPU-based data processing framework using Nvidia Rapids and Dask that accelerates CV data ETL by 70 times, facilitating real-time analysis.
Findings
70x faster data processing for Missouri CV data
Reduced ETL time from 48 hours to 25 minutes
Enables real-time transportation infrastructure insights
Abstract
Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has generated an abundance of CV data and sensor data that has put a strain on the processing capabilities of existing data center infrastructure. As a result, the benefits are either delayed or not fully realized. To address this issue, we propose a solution for processing state-wide CV traffic and sensor data on GPUs that provides real-time micro-scale insights in both temporal and spatial dimensions. This is achieved through the use of the Nvidia Rapids framework and the Dask parallel cluster in Python. Our findings demonstrate a 70x acceleration in the extraction, transformation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Data Management and Algorithms · Graph Theory and Algorithms
