From Rays to Projections: Better Inputs for Feed-Forward View Synthesis

Zirui Wu; Zeren Jiang; Martin R. Oswald; Jie Song

arXiv:2601.05116·cs.CV·January 9, 2026

From Rays to Projections: Better Inputs for Feed-Forward View Synthesis

Zirui Wu, Zeren Jiang, Martin R. Oswald, Jie Song

PDF

Open Access

TL;DR

This paper introduces projective conditioning and a masked autoencoding pretraining strategy to improve the robustness and consistency of feed-forward view synthesis models, achieving state-of-the-art results.

Contribution

It proposes a novel projective cue for conditioning view synthesis models and a tailored pretraining method, enhancing geometric consistency and fidelity.

Findings

01

Improved view consistency over ray-conditioned models

02

State-of-the-art quality on view synthesis benchmarks

03

Effective use of uncalibrated data for pretraining

Abstract

Feed-forward view synthesis models predict a novel view in a single pass with minimal 3D inductive bias. Existing works encode cameras as Pl\"ucker ray maps, which tie predictions to the arbitrary world coordinate gauge and make them sensitive to small camera transformations, thereby undermining geometric consistency. In this paper, we ask what inputs best condition a model for robust and consistent view synthesis. We propose projective conditioning, which replaces raw camera parameters with a target-view projective cue that provides a stable 2D input. This reframes the task from a brittle geometric regression problem in ray space to a well-conditioned target-view image-to-image translation problem. Additionally, we introduce a masked autoencoding pretraining strategy tailored to this cue, enabling the use of large-scale uncalibrated data for pretraining. Our method shows improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis