Depth Field Networks for Generalizable Multi-view Scene Representation

Vitor Guizilini; Igor Vasiljevic; Jiading Fang; Rares Ambrus; Greg; Shakhnarovich; Matthew Walter; Adrien Gaidon

arXiv:2207.14287·cs.CV·July 29, 2022

Depth Field Networks for Generalizable Multi-view Scene Representation

Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg, Shakhnarovich, Matthew Walter, Adrien Gaidon

PDF

Open Access 1 Repo

TL;DR

This paper introduces Depth Field Networks (DeFiNe), a novel approach that learns implicit, multi-view consistent scene representations using data augmentation and auxiliary view synthesis, achieving state-of-the-art depth estimation and strong domain generalization.

Contribution

It proposes a new method combining multi-view consistency, data augmentation, and auxiliary view synthesis to improve depth estimation and domain generalization without explicit geometric constraints.

Findings

01

Achieves state-of-the-art stereo and video depth estimation results.

02

Significantly improves zero-shot domain generalization.

03

Outperforms existing methods without explicit geometric constraints.

Abstract

Modern 3D computer vision leverages learning to boost geometric reasoning, mapping image data to classical structures such as cost volumes or epipolar constraints to improve matching. These architectures are specialized according to the particular problem, and thus require significant task-specific tuning, often leading to poor domain generalization performance. Recently, generalist Transformer architectures have achieved impressive results in tasks such as optical flow and depth estimation by encoding geometric priors as inputs rather than as enforced constraints. In this paper, we extend this idea and propose to learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity. We also show that introducing view synthesis as an auxiliary task further improves depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TRI-ML/vidar
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Label Smoothing