Depth Is All You Need for Monocular 3D Detection

Dennis Park; Jie Li; Dian Chen; Vitor Guizilini; Adrien Gaidon

arXiv:2210.02493·cs.CV·October 7, 2022

Depth Is All You Need for Monocular 3D Detection

Dennis Park, Jie Li, Dian Chen, Vitor Guizilini, Adrien Gaidon

PDF

Open Access

TL;DR

This paper introduces an unsupervised domain adaptation method for monocular depth estimation that enhances 3D detection accuracy using LiDAR or RGB videos during training, achieving state-of-the-art results.

Contribution

It proposes a novel multi-task learning framework that aligns depth representations with the target domain using unsupervised data, improving 3D detection performance.

Findings

01

Improved 3D detection on KITTI and NuScenes datasets.

02

Two-stage training with pseudo-depth labels is crucial for RGB video-based methods.

03

Achieves state-of-the-art results with the same test-time complexity.

Abstract

A key contributor to recent progress in 3D detection from single images is monocular depth estimation. Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features. More recent works leverage depth prediction as a pretraining task and fine-tune the depth representation while training it for 3D detection. However, the adaptation is insufficient and is limited in scale by manual labels. In this work, we propose to further align depth representation with the target domain in unsupervised fashions. Our methods leverage commonly available LiDAR or RGB videos during training time to fine-tune the depth representation, which leads to improved 3D detectors. Especially when using RGB videos, we show that our two-stage training by first generating pseudo-depth labels is critical because of the inconsistency in loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsALIGN