WcDT: World-centric Diffusion Transformer for Traffic Scene Generation
Chen Yang, Yangfan He, Aaron Xuxiang Tian, Dong Chen, Jianhui Wang,, Tianyu Shi, Arsalan Heydarian, Pei Liu

TL;DR
This paper presents WcDT, a novel traffic scene generation framework combining diffusion models and transformers to produce diverse, realistic autonomous driving trajectories for simulation systems.
Contribution
The paper introduces a new world-centric diffusion transformer framework that integrates diffusion models with transformer-based encoding for improved traffic scene trajectory generation.
Findings
Superior performance in generating realistic trajectories
Enhanced scene diversity and stochasticity
Effective integration of diffusion models with transformers
Abstract
In this paper, we introduce a novel approach for autonomous driving trajectory generation by harnessing the complementary strengths of diffusion probabilistic models (a.k.a., diffusion models) and transformers. Our proposed framework, termed the "World-Centric Diffusion Transformer"(WcDT), optimizes the entire trajectory generation process, from feature extraction to model inference. To enhance the scene diversity and stochasticity, the historical trajectory data is first preprocessed into "Agent Move Statement" and encoded into latent space using Denoising Diffusion Probabilistic Models (DDPM) enhanced with Diffusion with Transformer (DiT) blocks. Then, the latent features, historical trajectories, HD map features, and historical traffic signal information are fused with various transformer-based encoders that are used to enhance the interaction of agents with other elements in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing
