VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Chengjie Huang; Vahdat Abdelzad; Sean Sedwards; Krzysztof Czarnecki

arXiv:2411.13186·cs.CV·November 21, 2024

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki

PDF

Open Access

TL;DR

VADet introduces an adaptive multi-frame aggregation method for LiDAR 3D object detection that dynamically adjusts the number of frames per object based on observed properties, improving detection performance.

Contribution

The paper presents VADet, a novel adaptive aggregation approach that enhances LiDAR 3D detection by customizing frame aggregation per object, overcoming fixed aggregation limitations.

Findings

01

Achieves state-of-the-art results on the Waymo dataset.

02

Improves detection accuracy by adaptive frame aggregation.

03

Compatible with multiple detector architectures.

Abstract

Input aggregation is a simple technique used by state-of-the-art LiDAR 3D object detectors to improve detection. However, increasing aggregation is known to have diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To address this limitation, we propose an efficient adaptive method, which we call Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. VADet thus reduces the inherent trade-offs of fixed aggregation and is not architecture specific. To demonstrate its benefits, we apply VADet to three popular single-stage detectors and achieve state-of-the-art performance on the Waymo dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings