Visual Text Generation in the Wild
Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng, Wang, Fei Huang, Cong Yao, and Zhibo Yang

TL;DR
This paper introduces SceneVTG, a novel visual text generator that produces high-quality, scene-coherent, and utilitarian text images in the wild by combining multimodal language models and diffusion techniques.
Contribution
The paper presents SceneVTG, a two-stage framework integrating multimodal large language models and diffusion models for high-quality, scene-aware text image generation in real-world scenarios.
Findings
Outperforms existing rendering and diffusion methods in fidelity and reasonability.
Generates images that improve text detection and recognition tasks.
Demonstrates superior utility of generated images in practical applications.
Abstract
Recently, with the rapid advancements of generative models, the field of visual text generation has witnessed significant progress. However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in the given conditions; (2) Reasonability: the regions and contents of the generated text should cohere with the scene; (3) Utility: the generated text images can facilitate related tasks (e.g., text detection and recognition). Upon investigation, we find that existing methods, either rendering-based or diffusion-based, can hardly meet all these aspects simultaneously, limiting their application range. Therefore, we propose in this paper a visual text generator (termed SceneVTG), which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Storytelling and Education
MethodsDiffusion
