DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
Liang Peng, Xiaopei Wu, Zheng Yang, Haifeng Liu, and Deng Cai

TL;DR
DID-M3D introduces a novel approach to monocular 3D object detection by decoupling instance depth into visual surface depth and attribute depth, improving accuracy and robustness.
Contribution
The paper proposes a new depth reformulation and decoupling strategy that enhances monocular 3D detection performance and data augmentation capabilities.
Findings
Achieves state-of-the-art results on KITTI dataset.
Effectively disentangles visual and attribute depth uncertainties.
Improves data augmentation impact in monocular 3D detection.
Abstract
Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity. It takes an RGB image as input and predicts 3D boxes in the 3D space. The most challenging sub-task lies in the instance depth estimation. Previous works usually use a direct estimation method. However, in this paper we point out that the instance depth on the RGB image is non-intuitive. It is coupled by visual depth clues and instance attribute clues, making it hard to be directly learned in the network. Therefore, we propose to reformulate the instance depth to the combination of the instance visual surface depth (visual depth) and the instance attribute depth (attribute depth). The visual depth is related to objects' appearances and positions on the image. By contrast, the attribute depth relies on objects' inherent attributes, which are invariant to the object affine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Image Enhancement Techniques
