LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
Ngoc Phuong Anh Duong, Alexandre Almin, L\'eo Lemari\'e, B, Ravi Kiran

TL;DR
This paper evaluates how active learning combined with data augmentation can efficiently reduce dataset size and annotation costs in LiDAR-based autonomous driving, achieving full accuracy with fewer samples.
Contribution
It introduces a principled evaluation of dataset distillation using active learning and data augmentation on LiDAR point cloud data, demonstrating significant efficiency gains.
Findings
Data augmentation improves sample selection in active learning.
Full dataset accuracy achieved with only 60% of samples.
Faster training and reduced annotation costs.
Abstract
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference
