Depth Field Networks for Generalizable Multi-view Scene Representation
Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg, Shakhnarovich, Matthew Walter, Adrien Gaidon

TL;DR
This paper introduces Depth Field Networks (DeFiNe), a novel approach that learns implicit, multi-view consistent scene representations using data augmentation and auxiliary view synthesis, achieving state-of-the-art depth estimation and strong domain generalization.
Contribution
It proposes a new method combining multi-view consistency, data augmentation, and auxiliary view synthesis to improve depth estimation and domain generalization without explicit geometric constraints.
Findings
Achieves state-of-the-art stereo and video depth estimation results.
Significantly improves zero-shot domain generalization.
Outperforms existing methods without explicit geometric constraints.
Abstract
Modern 3D computer vision leverages learning to boost geometric reasoning, mapping image data to classical structures such as cost volumes or epipolar constraints to improve matching. These architectures are specialized according to the particular problem, and thus require significant task-specific tuning, often leading to poor domain generalization performance. Recently, generalist Transformer architectures have achieved impressive results in tasks such as optical flow and depth estimation by encoding geometric priors as inputs rather than as enforced constraints. In this paper, we extend this idea and propose to learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity. We also show that introducing view synthesis as an auxiliary task further improves depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Label Smoothing
