Monocular Differentiable Rendering for Self-Supervised 3D Object   Detection

Deniz Beker; Hiroharu Kato; Mihai Adrian Morariu; Takahiro Ando; Toru; Matsuoka; Wadim Kehl; Adrien Gaidon

arXiv:2009.14524·cs.CV·October 1, 2020

Monocular Differentiable Rendering for Self-Supervised 3D Object Detection

Deniz Beker, Hiroharu Kato, Mihai Adrian Morariu, Takahiro Ando, Toru, Matsuoka, Wadim Kehl, Adrien Gaidon

PDF

TL;DR

This paper introduces a self-supervised approach for 3D object detection from monocular images that leverages differentiable rendering and shape priors, reducing reliance on expensive 3D labels or LiDAR data.

Contribution

It presents a novel method combining differentiable rendering, shape priors, and self-supervision for 3D detection and shape reconstruction from monocular images.

Findings

01

Effective use of noisy monocular depth for 3D detection

02

Achieves competitive accuracy without 3D ground-truth labels

03

Demonstrates robustness with real-world datasets like KITTI

Abstract

3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale. To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective derived from a pretrained monocular depth estimation network. We use the KITTI 3D object detection dataset to evaluate the accuracy of the method. Experiments demonstrate that we can effectively use noisy monocular depth and differentiable rendering as an alternative to expensive 3D ground-truth labels or LiDAR information.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.