DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey, Zakharov, Vincent Sitzmann, Adrien Gaidon

TL;DR
DeLiRa introduces a self-supervised, multi-task Transformer-based approach for 3D scene reconstruction that improves volumetric rendering and view synthesis, especially with limited viewpoints, by jointly modeling depth, light, and radiance fields.
Contribution
The paper presents a novel multi-task Transformer architecture that jointly learns depth, light, and radiance fields, enhancing volumetric rendering without increasing network complexity.
Findings
Achieves state-of-the-art results on ScanNet benchmark.
Enables real-time novel view and depth synthesis.
Improves rendering quality in limited viewpoint scenarios.
Abstract
Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis. However, standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity, a common scenario in robotics applications. In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature as a geometric regularizer for volumetric rendering, significantly improving novel view synthesis without requiring additional information. Building upon this insight, we explore the explicit modeling of scene geometry using a generalist Transformer, jointly learning a radiance field as well as depth and light fields with a set of shared latent codes. We demonstrate that sharing geometric information across tasks is mutually beneficial, leading to improvements over single-task learning without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Dense Connections · Label Smoothing · Adam · Softmax · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Residual Connection · Position-Wise Feed-Forward Layer
