GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation

Tianchen Deng; Xuefeng Chen; Yi Chen; Qu Chen; Yuyao Xu; Lijin Yang; Le Xu; Yu Zhang; Bo Zhang; Wuxiong Huang; Hesheng Wang

arXiv:2512.23180·cs.CV·May 19, 2026

GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation

Tianchen Deng, Xuefeng Chen, Yi Chen, Qu Chen, Yuyao Xu, Lijin Yang, Le Xu, Yu Zhang, Bo Zhang, Wuxiong Huang, Hesheng Wang

PDF

1 Repo

TL;DR

GaussianDWM introduces a unified 3D Gaussian scene representation for driving world models, enabling enhanced scene understanding and multi-modal generation with aligned textual information.

Contribution

It proposes a novel 3D Gaussian scene representation that aligns textual features with 3D scenes and integrates language-guided sampling for improved multi-modal driving environment modeling.

Findings

01

Achieves state-of-the-art performance on nuScenes and NuInteract datasets.

02

Effectively aligns textual information with 3D scenes using Gaussian primitives.

03

Demonstrates improved multi-modal scene generation and understanding.

Abstract

Driving World Models (DWMs) have been developing rapidly with the advances of generative models. However, existing DWMs lack 3D scene understanding capabilities and can only generate content conditioned on input data, without the ability to interpret or reason about the driving environment. Moreover, current approaches represent 3D spatial information with point cloud or BEV features do not accurately align textual information with the underlying 3D scene. To address these limitations, we propose a novel unified DWM framework based on 3D Gaussian scene representation, which enables both 3D scene understanding and multi-modal scene generation, while also enabling contextual enrichment for understanding and generation tasks. Our approach directly aligns textual information with the 3D scene by embedding rich linguistic features into each Gaussian primitive, thereby achieving early…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dtc111111/GaussianDWM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis