Revisiting Monocular 3D Object Detection with Depth Thickness Field

Qiude Zhang; Chunyu Lin; Zhijie Shen; Nie Lang; Yao Zhao

arXiv:2412.19165·cs.CV·March 25, 2025

Revisiting Monocular 3D Object Detection with Depth Thickness Field

Qiude Zhang, Chunyu Lin, Zhijie Shen, Nie Lang, Yao Zhao

PDF

Open Access

TL;DR

This paper introduces a novel Depth Thickness Field approach for monocular 3D object detection, improving 3D structure understanding and detection accuracy by embedding scene-level depth information and refining instance-level details.

Contribution

The paper proposes MonoDTF, a new scene-to-instance depth-adapted network with modules for scene-level depth retargeting and instance-level spatial refinement, enhancing 3D detection performance.

Findings

01

Outperforms existing state-of-the-art methods on KITTI and Waymo datasets.

02

Demonstrates universality across different depth estimation models.

03

Improves 3D structure awareness and detection accuracy.

Abstract

Monocular 3D object detection is challenging due to the lack of accurate depth. However, existing depth-assisted solutions still exhibit inferior performance, whose reason is universally acknowledged as the unsatisfactory accuracy of monocular depth estimation models. In this paper, we revisit monocular 3D object detection from the depth perspective and formulate an additional issue as the limited 3D structure-aware capability of existing depth representations (e.g., depth one-hot encoding or depth distribution). To address this issue, we introduce a novel Depth Thickness Field approach to embed clear 3D structures of the scenes. Specifically, we present MonoDTF, a scene-to-instance depth-adapted network for monocular 3D object detection. The framework mainly comprises a Scene-Level Depth Retargeting (SDR) module and an Instance-Level Spatial Refinement (ISR) module. The former…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · Advanced Vision and Imaging