TL;DR
OmniRoam is a panoramic video generation framework enabling long-horizon scene wandering with high fidelity, controllability, and consistency, leveraging a novel two-stage process and new datasets.
Contribution
The paper introduces a controllable panoramic video generation framework with a preview and refine stage, and provides new datasets for training and evaluation.
Findings
Outperforms state-of-the-art in visual quality and consistency
Enables high-resolution, long-range scene wandering
Supports real-time generation and 3D reconstruction extensions
Abstract
Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
