OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Petr Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna

TL;DR
OmniView is a unified diffusion-based framework capable of synthesizing 3D and 4D views, including static and dynamic scenes, from various inputs, outperforming task-specific models across multiple benchmarks.
Contribution
It introduces a flexible, generalist diffusion model that separately encodes space, time, and view conditions, enabling diverse 4D view synthesis tasks within a single framework.
Findings
Improves multiview NVS LLFF image quality by up to 33%.
Reduces camera trajectory errors by 4x in text-conditioned video generation.
Achieves competitive performance across diverse 4D view synthesis benchmarks.
Abstract
Prior approaches injecting camera control into diffusion models have focused on specific subsets of 4D consistency tasks: novel view synthesis, text-to-video with camera control, image-to-video, amongst others. Therefore, these fragmented approaches are trained on disjoint slices of available 3D/4D data. We introduce OmniView, a unified framework that generalizes across a wide range of 4D consistency tasks. Our method separately represents space, time, and view conditions, enabling flexible combinations of these inputs. For example, OmniView can synthesize novel views from static, dynamic, and multiview inputs, extrapolate trajectories forward and backward in time, and create videos from text or image prompts with full camera control. OmniView is competitive with task-specific models across diverse benchmarks and metrics, improving image quality scores among camera-conditioned diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Multimodal Machine Learning Applications
