VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Sixiao Zheng; Minghao Yin; Wenbo Hu; Xiaoyu Li; Ying Shan; Yanwei Fu

arXiv:2601.05138·cs.CV·March 31, 2026

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Sixiao Zheng, Minghao Yin, Wenbo Hu, Xiaoyu Li, Ying Shan, Yanwei Fu

PDF

1 Repo 1 Models

TL;DR

VerseCrafter introduces a 4D geometric control framework for realistic, controllable video generation that captures dynamic object and camera motions with high fidelity.

Contribution

It proposes a novel 4D geometric control representation and a scalable training dataset, enabling unified, precise control over dynamic scenes in video generation.

Findings

01

Achieves superior visual quality compared to prior methods.

02

Provides more accurate control over camera and object motions.

03

Demonstrates effectiveness on a large real-world dataset.

Abstract

Video world models aim to simulate dynamic, real-world environments, yet existing methods struggle to provide unified and precise control over camera and multi-object motion, as videos inherently capture dynamics in the projected 2D image plane. To bridge this gap, we introduce VerseCrafter, a geometry-driven video world model that generates dynamic, realistic videos from a unified 4D geometric world state. Our approach is centered on a novel 4D Geometric Control representation, which encodes the world state as a static background point cloud and per-object 3D Gaussian trajectories. This representation captures each object's motion path and probabilistic 3D occupancy over time, providing a flexible, category-agnostic alternative to rigid bounding boxes and parametric models. We render 4D Geometric Control into 4D control maps for a pretrained video diffusion model, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencentarc/VerseCrafter
github

Models

🤗
TencentARC/VerseCrafter
model· 165 dl· ♡ 16
165 dl♡ 16

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.