D-Align: Dual Query Co-attention Network for 3D Object Detection Based on Multi-frame Point Cloud Sequence
Junhyung Lee, Junho Koh, Youngwoo Lee, Jun Won Choi

TL;DR
D-Align introduces a dual-query co-attention network that leverages multi-frame point cloud sequences to enhance 3D object detection accuracy by aligning and aggregating spatio-temporal features.
Contribution
It proposes a novel dual-query co-attention network for effectively integrating temporal information in 3D object detection from point cloud sequences.
Findings
Significantly outperforms existing 3D detectors on nuScenes dataset.
Improves detection accuracy over single-frame baseline methods.
Effectively aligns and aggregates features from multiple frames.
Abstract
LiDAR sensors are widely used for 3D object detection in various mobile robotics applications. LiDAR sensors continuously generate point cloud data in real-time. Conventional 3D object detectors detect objects using a set of points acquired over a fixed duration. However, recent studies have shown that the performance of object detection can be further enhanced by utilizing spatio-temporal information obtained from point cloud sequences. In this paper, we propose a new 3D object detector, named D-Align, which can effectively produce strong bird's-eye-view (BEV) features by aligning and aggregating the features obtained from a sequence of point sets. The proposed method includes a novel dual-query co-attention network that uses two types of queries, including target query set (T-QS) and support query set (S-QS), to update the features of target and support frames, respectively. D-Align…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
