AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features
Ruochen Zhang, Hyeung-Sik Choi, Dongwook Jung, Phan Huy Nam Anh,, Sang-Ki Jeong, Zihao Zhu

TL;DR
AuxDepthNet is a real-time monocular 3D object detection framework that learns depth-sensitive features without external depth maps, achieving state-of-the-art results on KITTI.
Contribution
It introduces the ADF and DPM modules within a DepthFusion Transformer to improve spatial reasoning and object localization without external depth estimators.
Findings
Achieves state-of-the-art AP scores on KITTI dataset.
Operates in real-time without external depth maps.
Demonstrates robustness across different difficulty levels.
Abstract
Monocular 3D object detection is a challenging task in autonomous systems due to the lack of explicit depth information in single-view images. Existing methods often depend on external depth estimators or expensive sensors, which increase computational complexity and hinder real-time performance. To overcome these limitations, we propose AuxDepthNet, an efficient framework for real-time monocular 3D object detection that eliminates the reliance on external depth maps or pre-trained depth models. AuxDepthNet introduces two key components: the Auxiliary Depth Feature (ADF) module, which implicitly learns depth-sensitive features to improve spatial reasoning and computational efficiency, and the Depth Position Mapping (DPM) module, which embeds depth positional information directly into the detection process to enable accurate object localization and 3D bounding box regression. Leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Absolute Position Encodings · Softmax · Linear Layer · Adam · Residual Connection · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing
