OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Xiang Fan; Sharath Girish; Vivek Ramanujan; Chaoyang Wang; Ashkan Mirzaei; Petr Sushko; Aliaksandr Siarohin; Sergey Tulyakov; Ranjay Krishna

arXiv:2512.10940·cs.CV·January 26, 2026

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Petr Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna

PDF

Open Access

TL;DR

OmniView is a unified diffusion-based framework capable of synthesizing 3D and 4D views, including static and dynamic scenes, from various inputs, outperforming task-specific models across multiple benchmarks.

Contribution

It introduces a flexible, generalist diffusion model that separately encodes space, time, and view conditions, enabling diverse 4D view synthesis tasks within a single framework.

Findings

01

Improves multiview NVS LLFF image quality by up to 33%.

02

Reduces camera trajectory errors by 4x in text-conditioned video generation.

03

Achieves competitive performance across diverse 4D view synthesis benchmarks.

Abstract

Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-video, amongst others. Therefore, these fragmented approaches are trained on disjoint slices of available 3D/4D data. We introduce OmniView, a unified framework that generalizes across a wide range of 4D consistency tasks. Our method separately represents space, time, and view conditions, enabling flexible combinations of these inputs. For example, OmniView can synthesize novel views from static, dynamic, and multiview inputs, extrapolate trajectories forward and backward in time, and create videos from text or image prompts with full camera control. OmniView is competitive with task-specific models across diverse benchmarks and metrics, improving image quality scores among camera-conditioned diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Multimodal Machine Learning Applications