FocalFormer3D : Focusing on Hard Instance for 3D Object Detection
Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar,, Jiaya Jia, Jose Alvarez

TL;DR
FocalFormer3D introduces a novel 3D object detection method that emphasizes hard instance detection, significantly improving recall and achieving top performance on nuScenes benchmarks by focusing on difficult objects through multi-stage query generation and transformer decoding.
Contribution
The paper presents Hard Instance Probing (HIP) and FocalFormer3D, a new detection framework that effectively identifies and excavates hard-to-detect objects in 3D scenes, outperforming existing methods.
Findings
Achieves 70.5 mAP and 73.9 NDS on nuScenes detection benchmark.
Ranks 1st on nuScenes LiDAR detection leaderboard.
Improves detection and tracking performance in LiDAR and multi-modal settings.
Abstract
False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
