X-World: Controllable Ego-Centric Multi-Camera World Models for Scalable End-to-End Driving
Chaoda Zheng, Sean Li, Jinhao Deng, Zhennan Wang, Shijia Chen, Liqiang Xiao, Ziheng Chi, Hongbin Lin, Kangjie Chen, Boyang Wang, Yu Zhang, Xianming Liu

TL;DR
X-World is a controllable multi-camera generative world model for autonomous driving that produces realistic, long-term, multi-view video simulations with scene and appearance controls, enabling scalable evaluation.
Contribution
The paper introduces X-World, a novel multi-camera generative model that simulates future driving scenes with controllability and scene editing capabilities, advancing autonomous driving evaluation.
Findings
Achieves high-quality, multi-view video generation with strong view consistency.
Maintains stable and coherent long-term scene dynamics.
Supports flexible scene and appearance controls, including traffic and weather.
Abstract
Scalable and reliable evaluation is increasingly critical in the end-to-end era of autonomous driving, where vision--language--action (VLA) policies directly map raw sensor streams to driving actions. Yet, current evaluation pipelines still rely heavily on real-world road testing, which is costly, biased toward limited scenario coverage, and difficult to reproduce. These challenges motivate a real-world simulator that can generate realistic future observations under proposed actions, while remaining controllable and stable over long horizons. We present X-World, an action-conditioned multi-camera generative world model that simulates future observations directly in video space. Given synchronized multi-view camera history and a future action sequence, X-World generates future multi-camera video streams that follow the commanded actions. To ensure reproducible and editable scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
