TL;DR
PanoWorld introduces a geometry-aware panoramic video generation method that ensures depth and motion consistency, advancing the realism and spatial understanding of 360-degree videos.
Contribution
It presents a novel geometry- and dynamics-consistent modeling approach with regularizers and a new dataset, improving geometric coherence in panoramic video synthesis.
Findings
Enhanced geometric consistency over prior methods
Maintains competitive visual realism
Introduces PanoGeo dataset for training and evaluation
Abstract
We present PanoWorld, a panoramic video world model that generates geometry-consistent 360 video from a single image and a caption. Existing panoramic video methods optimize primarily for visual realism and do not explicitly constrain the underlying 3D scene state, producing outputs that appear plausible yet exhibit inconsistent depth, broken correspondences, and implausible motion across the spherical surface. We address this gap by framing panoramic video generation as a geometry- and dynamics-consistent latent state modeling problem rather than pure visual synthesis. Building on a pre-trained perspective video world model, we introduce two lightweight regularizers: a depth consistency loss against pseudo ground-truth panoramic depth, and a trajectory consistency loss that supervises the 3D world-frame positions of tracked points across time. We further apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
