From Big Data to Fast Data: Towards High-Quality Datasets for Machine Learning Applications from Closed-Loop Data Collection
Philipp Reis, Jacqueline Henle, Stefan Otten, Eric Sax

TL;DR
This paper proposes a real-time, context-aware data collection approach called Fast Data for automotive systems, enhancing data relevance and quality for machine learning applications by shifting data selection to the vehicle.
Contribution
It introduces the concept of Fast Data, enabling on-vehicle, real-time data selection to improve dataset quality and reduce costs in automotive ML development.
Findings
Higher relevance and coverage of critical scenarios in datasets.
Reduced irrelevant data and associated costs.
Supports scalable, cost-effective ML development.
Abstract
The increasing capabilities of machine learning models, such as vision-language and multimodal language models, are placing growing demands on data in automotive systems engineering, making the quality and relevance of collected data enablers for the development and validation of such systems. Traditional Big Data approaches focus on large-scale data collection and offline processing, while Smart Data approaches improve data selection strategies but still rely on centralized and offline post-processing. This paper introduces the concept of Fast Data for automotive systems engineering. The approach shifts data selection and recording onto the vehicle as the data source. By enabling real-time, context-aware decisions on whether and which data should be recorded, data collection can be directly aligned with data quality objectives and collection strategies within a closed-loop. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
