Text2Traffic: A Text-to-Image Generation and Editing Method for Traffic Scenes
Feng Lv, Haoxuan Feng, Zilu Zhang, Chunlong Xia, and Yanfeng Li

TL;DR
Text2Traffic is a unified framework that improves text-driven traffic scene image generation and editing by enhancing semantic richness, visual fidelity, and multi-view diversity through a controllable mask mechanism and a two-stage training process.
Contribution
It introduces a novel controllable mask mechanism and a two-stage training paradigm for improved traffic scene image synthesis and editing from text descriptions.
Findings
Achieves state-of-the-art performance in traffic scene image generation.
Enhances small traffic element fidelity with mask-region-weighted loss.
Increases geometric diversity using multi-view data.
Abstract
With the rapid advancement of intelligent transportation systems, text-driven image generation and editing techniques have demonstrated significant potential in providing rich, controllable visual scene data for applications such as traffic monitoring and autonomous driving. However, several challenges remain, including insufficient semantic richness of generated traffic elements, limited camera viewpoints, low visual fidelity of synthesized images, and poor alignment between textual descriptions and generated content. To address these issues, we propose a unified text-driven framework for both image generation and editing, leveraging a controllable mask mechanism to seamlessly integrate the two tasks. Furthermore, we incorporate both vehicle-side and roadside multi-view data to enhance the geometric diversity of traffic scenes. Our training strategy follows a two-stage paradigm: first,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
