FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Yilun Chen; Zhiding Yu; Yukang Chen; Shiyi Lan; Animashree Anandkumar,; Jiaya Jia; Jose Alvarez

arXiv:2308.04556·cs.CV·August 10, 2023·5 cites

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar,, Jiaya Jia, Jose Alvarez

PDF

Open Access 1 Repo

TL;DR

FocalFormer3D introduces a novel 3D object detection method that emphasizes hard instance detection, significantly improving recall and achieving top performance on nuScenes benchmarks by focusing on difficult objects through multi-stage query generation and transformer decoding.

Contribution

The paper presents Hard Instance Probing (HIP) and FocalFormer3D, a new detection framework that effectively identifies and excavates hard-to-detect objects in 3D scenes, outperforming existing methods.

Findings

01

Achieves 70.5 mAP and 73.9 NDS on nuScenes detection benchmark.

02

Ranks 1st on nuScenes LiDAR detection leaderboard.

03

Improves detection and tracking performance in LiDAR and multi-modal settings.

Abstract

False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NVlabs/FocalFormer3D
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques