CAM3DNet: Comprehensively mining the multi-scale features for 3D Object Detection with Multi-View Cameras
Mingxi Pang, Dingheng Wang, Zekun Li, Zhenping Sun, Bo Wang, Zhihang Wang, Zhao-Xu Yang

TL;DR
CAM3DNet is a novel framework for 3D object detection from multi-view images that effectively leverages multi-scale spatiotemporal features through three innovative modules, outperforming existing methods.
Contribution
The paper introduces CAM3DNet, which combines composite query, adaptive self-attention, and multi-scale hybrid sampling modules for improved multi-view 3D detection.
Findings
Outperforms existing camera-based 3D detection methods on nuScenes, Waymo, and Argoverse datasets.
Demonstrates effectiveness of the three proposed modules through extensive experiments.
Provides ablation studies on the individual contributions and computational costs of each module.
Abstract
Query-based 3D object detection methods using multi-view images often struggle to efficiently leverage dynamic multi-scale information, e.g., the relationship between the object features and the geometric of the queries are not sufficiently learned, directly exploring the multi-scale spatiotemporal features will pay too many costs. To address these challenges, we propose CAM3DNet, a novel sparse query-based framework which combines three new modules, composite query (CQ), adaptive self-attention (ASA), and multi-scale hybrid sampling (MSHS). First, the core idea in the CQ module is a multi-scale projection strategy to transform 2D queries into 3D space. Second, the ASA module learns the interactions between the spatiotemporal multi-scale queries. Third, the MSHS module uses the deformable attention mechanism to sample multi-scale object information by considering multi-scales queries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
