UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
Thanh-Tung Le, Tuan Pham, Tung Nguyen, Deying Kong, Xiaohui Xie, Stephan Mandt

TL;DR
UMAMI introduces a hybrid view synthesis model combining deterministic and stochastic methods, leveraging a transformer-based architecture to produce high-quality, 3D-consistent images efficiently from sparse views.
Contribution
The paper presents a novel unified framework that integrates masked autoregressive diffusion and deterministic rendering, enabling scalable, high-quality novel view synthesis without handcrafted 3D biases.
Findings
Achieves state-of-the-art image quality in view synthesis.
Reduces rendering time by an order of magnitude compared to fully generative models.
Effectively handles both observed and unobserved regions in scenes.
Abstract
Novel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks render observed regions quickly but blur unobserved areas, whereas stochastic diffusion-based methods hallucinate plausible content yet incur heavy training- and inference-time costs. In this paper, we propose a hybrid framework that unifies the strengths of both paradigms. A bidirectional transformer encodes multi-view image tokens and Plucker-ray embeddings, producing a shared latent representation. Two lightweight heads then act on this representation: (i) a feed-forward regression head that renders pixels where geometry is well constrained, and (ii) a masked autoregressive diffusion head that completes occluded or unseen regions. The entire model is trained end-to-end with joint photometric and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques
