Diverse Complexity Measures for Dataset Curation in Self-driving
Abbas Sadat, Sean Segal, Sergio Casas, James Tu, Bin Yang, Raquel, Urtasun, Ersin Yumer

TL;DR
This paper introduces a novel data curation method for self-driving datasets that uses diverse criteria to select traffic scenes, improving model generalization across multiple tasks.
Contribution
It proposes a new data selection approach based on diverse interestingness criteria, addressing limitations of fixed-model active learning in self-driving applications.
Findings
Improved model performance and generalization on multiple tasks.
Effective dataset curation leads to better autonomous driving models.
Versatile approach applicable across different models and tasks.
Abstract
Modern self-driving autonomy systems heavily rely on deep learning. As a consequence, their performance is influenced significantly by the quality and richness of the training data. Data collecting platforms can generate many hours of raw data in a daily basis, however, it is not feasible to label everything. It is thus of key importance to have a mechanism to identify "what to label". Active learning approaches identify examples to label, but their interestingness is tied to a fixed model performing a particular task. These assumptions are not valid in self-driving, where we have to solve a diverse set of tasks (i.e., perception, and motion forecasting) and our models evolve over time frequently. In this paper we introduce a novel approach and propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
