Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion
Xuewu Lin, Tianwei Lin, Zixiang Pei, Lichao Huang, Zhizhong Su

TL;DR
Sparse4D introduces a novel sparse spatial-temporal fusion method for multi-view 3D object detection, achieving superior performance by iteratively refining anchor boxes through hierarchical feature sampling and fusion, suitable for edge deployment.
Contribution
The paper proposes Sparse4D, a new sparse 3D detection approach that combines 4D sampling and hierarchical feature fusion, outperforming existing sparse and many BEV methods.
Findings
Outperforms all sparse-based methods on nuScenes
Surpasses most BEV-based methods in detection accuracy
Efficiently achieves 3D detection without dense view transformation
Abstract
Bird-eye-view (BEV) based methods have made great progress recently in multi-view 3D detection task. Comparing with BEV based methods, sparse based methods lag behind in performance, but still have lots of non-negligible merits. To push sparse 3D detection further, in this work, we introduce a novel method, named Sparse4D, which does the iterative refinement of anchor boxes via sparsely sampling and fusing spatial-temporal features. (1) Sparse 4D Sampling: for each 3D anchor, we assign multiple 4D keypoints, which are then projected to multi-view/scale/timestamp image features to sample corresponding features; (2) Hierarchy Feature Fusion: we hierarchically fuse sampled features of different view/scale, different timestamp and different keypoints to generate high-quality instance feature. In this way, Sparse4D can efficiently and effectively achieve 3D detection without relying on dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
