AuxDepthNet: Real-Time Monocular 3D Object Detection with   Depth-Sensitive Features

Ruochen Zhang; Hyeung-Sik Choi; Dongwook Jung; Phan Huy Nam Anh,; Sang-Ki Jeong; Zihao Zhu

arXiv:2501.03700·cs.CV·January 8, 2025

AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features

Ruochen Zhang, Hyeung-Sik Choi, Dongwook Jung, Phan Huy Nam Anh,, Sang-Ki Jeong, Zihao Zhu

PDF

Open Access

TL;DR

AuxDepthNet is a real-time monocular 3D object detection framework that learns depth-sensitive features without external depth maps, achieving state-of-the-art results on KITTI.

Contribution

It introduces the ADF and DPM modules within a DepthFusion Transformer to improve spatial reasoning and object localization without external depth estimators.

Findings

01

Achieves state-of-the-art AP scores on KITTI dataset.

02

Operates in real-time without external depth maps.

03

Demonstrates robustness across different difficulty levels.

Abstract

Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and hinder real-time performance. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular 3D object detection that eliminates the reliance on external depth maps or pre-trained depth models. AuxDepthNet introduces two key components: the Auxiliary Depth Feature (ADF) module, which implicitly learns depth-sensitive features to improve spatial reasoning and computational efficiency, and the Depth Position Mapping (DPM) module, which embeds depth positional information directly into the detection process to enable accurate object localization and 3D bounding box regression. Leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Softmax · Linear Layer · Adam · Residual Connection · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing