DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

Vitor Guizilini; Igor Vasiljevic; Jiading Fang; Rares Ambrus; Sergey; Zakharov; Vincent Sitzmann; Adrien Gaidon

arXiv:2304.02797·cs.CV·April 7, 2023·1 cites

DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey, Zakharov, Vincent Sitzmann, Adrien Gaidon

PDF

Open Access

TL;DR

DeLiRa introduces a self-supervised, multi-task Transformer-based approach for 3D scene reconstruction that improves volumetric rendering and view synthesis, especially with limited viewpoints, by jointly modeling depth, light, and radiance fields.

Contribution

The paper presents a novel multi-task Transformer architecture that jointly learns depth, light, and radiance fields, enhancing volumetric rendering without increasing network complexity.

Findings

01

Achieves state-of-the-art results on ScanNet benchmark.

02

Enables real-time novel view and depth synthesis.

03

Improves rendering quality in limited viewpoint scenarios.

Abstract

Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis. However, standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity, a common scenario in robotics applications. In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature as a geometric regularizer for volumetric rendering, significantly improving novel view synthesis without requiring additional information. Building upon this insight, we explore the explicit modeling of scene geometry using a generalist Transformer, jointly learning a radiance field as well as depth and light fields with a set of shared latent codes. We demonstrate that sharing geometric information across tasks is mutually beneficial, leading to improvements over single-task learning without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsMulti-Head Attention · Dense Connections · Label Smoothing · Adam · Softmax · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer