FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

TL;DR
FCOS3D introduces a fully convolutional, single-stage framework for monocular 3D object detection that effectively leverages 2D detection advances to address depth ambiguity, achieving top results in the nuScenes challenge.
Contribution
This work presents a novel general framework that transforms 3D detection into a 2D problem with decoupled attributes, eliminating the need for 2D-3D priors and improving detection performance.
Findings
Achieved 1st place in nuScenes 3D detection challenge.
Effectively decouples 3D targets into 2D and 3D attributes.
Redefines center-ness using a 2D Gaussian based on 3D centers.
Abstract
Monocular 3D object detection is an important task for autonomous driving considering its advantage of low cost. It is much more challenging than conventional 2D cases due to its inherent ill-posed property, which is mainly reflected in the lack of depth information. Recent progress on 2D detection offers opportunities to better solving this problem. However, it is non-trivial to make a general adapted 2D detector work in this 3D task. In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D. Specifically, we first transform the commonly defined 7-DoF 3D targets to the image domain and decouple them as 2D and 3D attributes. Then the objects are distributed to different feature levels with consideration of their 2D scales and assigned only according to the projected 3D-center for the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
