A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation

Wentao Qu; Guofeng Mei; Yang Wu; Yongshun Gong; Xiaoshui Huang; Liang Xiao

arXiv:2511.19004·cs.CV·December 16, 2025

A Self-Conditioned Representation Guided Diffusion Model for Realistic Text-to-LiDAR Scene Generation

Wentao Qu, Guofeng Mei, Yang Wu, Yongshun Gong, Xiaoshui Huang, Liang Xiao

PDF

Open Access

TL;DR

This paper introduces T2LDM, a diffusion model with self-conditioned guidance for realistic text-to-LiDAR scene generation, addressing data scarcity and quality issues, and supporting multiple conditional tasks.

Contribution

The paper proposes a novel self-conditioned guidance mechanism in a diffusion model for improved scene generation and introduces a new benchmark and controllability analysis for Text-LiDAR tasks.

Findings

01

T2LDM outperforms existing methods in scene quality.

02

The model effectively utilizes rich geometric structures.

03

Controllability and fidelity are improved with the proposed techniques.

Abstract

Text-to-LiDAR generation can customize 3D data with rich structures and diverse scenes for downstream tasks. However, the scarcity of Text-LiDAR pairs often causes insufficient training priors, generating overly smooth 3D scenes. Moreover, low-quality text descriptions may degrade generation quality and controllability. In this paper, we propose a Text-to-LiDAR Diffusion Model for scene generation, named T2LDM, with a Self-Conditioned Representation Guidance (SCRG). Specifically, SCRG, by aligning to the real representations, provides the soft supervision with reconstruction details for the Denoising Network (DN) in training, while decoupled in inference. In this way, T2LDM can perceive rich geometric structures from data distribution, generating detailed objects in scenes. Meanwhile, we construct a content-composable Text-LiDAR benchmark, T2nuScenes, along with a controllability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Multimodal Machine Learning Applications