From Rays to Projections: Better Inputs for Feed-Forward View Synthesis
Zirui Wu, Zeren Jiang, Martin R. Oswald, Jie Song

TL;DR
This paper introduces projective conditioning and a masked autoencoding pretraining strategy to improve the robustness and consistency of feed-forward view synthesis models, achieving state-of-the-art results.
Contribution
It proposes a novel projective cue for conditioning view synthesis models and a tailored pretraining method, enhancing geometric consistency and fidelity.
Findings
Improved view consistency over ray-conditioned models
State-of-the-art quality on view synthesis benchmarks
Effective use of uncalibrated data for pretraining
Abstract
Feed-forward view synthesis models predict a novel view in a single pass with minimal 3D inductive bias. Existing works encode cameras as Pl\"ucker ray maps, which tie predictions to the arbitrary world coordinate gauge and make them sensitive to small camera transformations, thereby undermining geometric consistency. In this paper, we ask what inputs best condition a model for robust and consistent view synthesis. We propose projective conditioning, which replaces raw camera parameters with a target-view projective cue that provides a stable 2D input. This reframes the task from a brittle geometric regression problem in ray space to a well-conditioned target-view image-to-image translation problem. Additionally, we introduce a masked autoencoding pretraining strategy tailored to this cue, enabling the use of large-scale uncalibrated data for pretraining. Our method shows improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
