Scaling View Synthesis Transformers

Evan Kim; Hyunwoo Ryu; Thomas W. Mitchel; Vincent Sitzmann

arXiv:2602.21341·cs.CV·February 26, 2026

Scaling View Synthesis Transformers

Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, Vincent Sitzmann

PDF

Open Access

TL;DR

This paper systematically studies the scaling laws of geometry-free view synthesis transformers, introduces the SVSM architecture, and demonstrates its compute efficiency and superior performance on real-world benchmarks.

Contribution

It reveals that encoder-decoder architectures can be compute-optimal for view synthesis and provides design principles for training such models efficiently.

Findings

01

Encoder-decoder models scale as effectively as decoder-only models.

02

SVSM achieves a better performance-compute Pareto frontier.

03

SVSM surpasses previous state-of-the-art with less training compute.

Abstract

Geometry-free view synthesis transformers have recently achieved state-of-the-art performance in Novel View Synthesis (NVS), outperforming traditional approaches that rely on explicit geometry modeling. Yet the factors governing their scaling with compute remain unclear. We present a systematic study of scaling laws for view synthesis transformers and derive design principles for training compute-optimal NVS models. Contrary to prior findings, we show that encoder-decoder architectures can be compute-optimal; we trace earlier negative results to suboptimal architectural choices and comparisons across unequal training compute budgets. Across several compute levels, we demonstrate that our encoder-decoder architecture, which we call the Scalable View Synthesis Model (SVSM), scales as effectively as decoder-only models, achieves a superior performance-compute Pareto frontier, and surpasses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · 3D Shape Modeling and Analysis