OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space
Zhuding Liang, Tianyi Yan, Dubing Chen, Jiasen Zheng, Huan Zheng, Cheng-zhong Xu, Yida Wang, Kun Zhan, Jianbing Shen

TL;DR
OccDirector is a novel framework that generates realistic 4D occupancy dynamics for autonomous driving simulation based solely on natural language instructions, enabling complex multi-agent interactions without geometric priors.
Contribution
It introduces a language-conditioned 4D occupancy generation framework and a new dataset with multi-level language annotations for autonomous driving scenarios.
Findings
Achieves state-of-the-art quality in 4D occupancy generation.
Demonstrates strong instruction-following capabilities in complex scenarios.
Introduces a new dataset and evaluation benchmark for language-guided behavior generation.
Abstract
Generative world models increasingly rely on 4D occupancy for realistic autonomous driving simulation. However, existing generation frameworks depend on rigid geometric conditions (e.g., explicit trajectories) or simplistic attribute-level text, failing to orchestrate complex, sequential multi-agent interactions. To address this semantic-spatiotemporal gap, we propose OccDirector, a pioneering framework that generates 4D occupancy dynamics conditioned solely on natural language. Operating as a ``scenario director'', OccDirector maps natural language scripts into physically plausible voxel dynamics without requiring geometric priors. Technically, it employs a VLM-driven Spatio-Temporal MMDiT equipped with a history-prefix anchoring strategy to ensure long-horizon interaction consistency. Furthermore, we introduce OccInteract-85k, a novel dataset uniquely annotated with multi-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
