A Data-Driven Novelty Score for Diverse In-Vehicle Data Recording
Philipp Reis, Joshua Ransiek, David Petri, Jacob Langner, Eric Sax

TL;DR
This paper introduces a real-time, object-level novelty detection method using a dynamic Mean Shift algorithm to improve dataset diversity for autonomous driving perception systems, enhancing model robustness.
Contribution
It proposes a novel data-driven novelty score for real-time filtering of in-vehicle data, enabling more balanced datasets by effectively identifying and discarding redundant or common scenes.
Findings
Reducing dataset size with the method improves model performance.
Higher data redundancy allows for more aggressive filtering.
The method operates at 32 frames per second and adapts over time.
Abstract
High-quality datasets are essential for training robust perception systems in autonomous driving. However, real-world data collection is often biased toward common scenes and objects, leaving novel cases underrepresented. This imbalance hinders model generalization and compromises safety. The core issue is the curse of rarity. Over time, novel events occur infrequently, and standard logging methods fail to capture them effectively. As a result, large volumes of redundant data are stored, while critical novel cases are diluted, leading to biased datasets. This work presents a real-time data selection method focused on object-level novelty detection to build more balanced and diverse datasets. The method assigns a data-driven novelty score to image frames using a novel dynamic Mean Shift algorithm. It models normal content based on mean and covariance statistics to identify frames with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
