LiDAR dataset distillation within bayesian active learning framework:   Understanding the effect of data augmentation

Ngoc Phuong Anh Duong; Alexandre Almin; L\'eo Lemari\'e; B; Ravi Kiran

arXiv:2202.02661·cs.CV·February 8, 2022

LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Ngoc Phuong Anh Duong, Alexandre Almin, L\'eo Lemari\'e, B, Ravi Kiran

PDF

Open Access

TL;DR

This paper evaluates how active learning combined with data augmentation can efficiently reduce dataset size and annotation costs in LiDAR-based autonomous driving, achieving full accuracy with fewer samples.

Contribution

It introduces a principled evaluation of dataset distillation using active learning and data augmentation on LiDAR point cloud data, demonstrating significant efficiency gains.

Findings

01

Data augmentation improves sample selection in active learning.

02

Full dataset accuracy achieved with only 60% of samples.

03

Faster training and reduced annotation costs.

Abstract

Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference