Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

Feng Lv; Haoxuan Feng; Zilu Zhang; Chunlong Xia; and Yanfeng Li

arXiv:2511.12932·cs.CV·December 1, 2025

Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes

Feng Lv, Haoxuan Feng, Zilu Zhang, Chunlong Xia, and Yanfeng Li

PDF

Open Access

TL;DR

Text2Traffic is a unified framework that improves text-driven traffic scene image generation and editing by enhancing semantic richness, visual fidelity, and multi-view diversity through a controllable mask mechanism and a two-stage training process.

Contribution

It introduces a novel controllable mask mechanism and a two-stage training paradigm for improved traffic scene image synthesis and editing from text descriptions.

Findings

01

Achieves state-of-the-art performance in traffic scene image generation.

02

Enhances small traffic element fidelity with mask-region-weighted loss.

03

Increases geometric diversity using multi-view data.

Abstract

With the rapid advancement of intelligent transportation systems, text-driven image generation and editing techniques have demonstrated significant potential in providing rich, controllable visual scene data for applications such as traffic monitoring and autonomous driving. However, several challenges remain, including insufficient semantic richness of generated traffic elements, limited camera viewpoints, low visual fidelity of synthesized images, and poor alignment between textual descriptions and generated content. To address these issues, we propose a unified text-driven framework for both image generation and editing, leveraging a controllable mask mechanism to seamlessly integrate the two tasks. Furthermore, we incorporate both vehicle-side and roadside multi-view data to enhance the geometric diversity of traffic scenes. Our training strategy follows a two-stage paradigm: first,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques