WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Hanyang Kong; Xingyi Yang; Xiaoxu Zheng; Xinchao Wang

arXiv:2512.19678·cs.CV·December 23, 2025

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang

PDF

Open Access 1 Models

TL;DR

WorldWarp introduces a novel framework combining 3D geometric grounding with a 2D diffusion model to generate long-range, consistent videos that effectively handle occlusions and complex camera movements.

Contribution

It couples a 3D structural cache built via Gaussian Splatting with a spatio-temporal diffusion model for improved video consistency and quality.

Findings

01

Achieves state-of-the-art fidelity in 3D consistent video generation.

02

Effectively handles occlusions and complex camera trajectories.

03

Maintains geometric consistency across video chunks.

Abstract

Generating long-range, geometrically consistent video presents a fundamental dilemma: while consistency demands strict adherence to 3D geometry in pixel space, state-of-the-art generative models operate most effectively in a camera-conditioned latent space. This disconnect causes current methods to struggle with occluded areas and complex camera trajectories. To bridge this gap, we propose WorldWarp, a framework that couples a 3D structural anchor with a 2D generative refiner. To establish geometric grounding, WorldWarp maintains an online 3D geometric cache built via Gaussian Splatting (3DGS). By explicitly warping historical content into novel views, this cache acts as a structural scaffold, ensuring each new frame respects prior geometry. However, static warping inevitably leaves holes and artifacts due to occlusions. We address this using a Spatio-Temporal Diffusion (ST-Diff) model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
imsuperkong/worldwarp
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging