Input-level Inductive Biases for 3D Reconstruction
Wang Yifan, Carl Doersch, Relja Arandjelovi\'c, Jo\~ao Carreira,, Andrew Zisserman

TL;DR
This paper explores how to incorporate geometrical inductive biases directly as inputs into general 3D reconstruction models, enabling effective multi-view depth estimation without specialized architectures.
Contribution
It introduces a domain-agnostic input encoding method for geometrical biases, allowing existing models like Perceivers to perform 3D reconstruction tasks efficiently.
Findings
Competitive multi-view depth estimation performance
Effective encoding of camera and geometric information
Maintains data efficiency of bespoke models
Abstract
Much of the recent progress in 3D vision has been driven by the development of specialized architectures that incorporate geometrical inductive biases. In this paper we tackle 3D reconstruction using a domain agnostic architecture and study how instead to inject the same type of inductive biases directly as extra inputs to the model. This approach makes it possible to apply existing general models, such as Perceivers, on this rich domain, without the need for architectural changes, while simultaneously maintaining data efficiency of bespoke models. In particular we study how to encode cameras, projective ray incidence and epipolar geometry as model inputs, and demonstrate competitive multi-view depth estimation performance on multiple benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Image Processing Techniques and Applications
MethodsPerceiver IO
