First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou

TL;DR
This paper introduces a new visual text blending paradigm that first creates high-quality backgrounds and then renders text onto them, improving control, diversity, and fidelity in text-image synthesis.
Contribution
It proposes a novel approach combining background generation and text rendering, including a background generator and the GlyphOnly renderer based on Stable Diffusion, with applications in scene text dataset synthesis and editing.
Findings
Generated high-fidelity, diverse backgrounds for text blending
Achieved visually plausible text-background integration with GlyphOnly
Enhanced downstream tasks like scene text detection and editing
Abstract
Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with text prompts, leading to imprecise control and limited practicality. A more promising direction is visual text blending, which focuses on seamlessly merging texts onto text-free backgrounds. However, existing visual text blending methods often struggle to generate high-fidelity and diverse images due to a shortage of backgrounds for synthesis and limited generalization capabilities. To overcome these challenges, we propose a new visual text blending paradigm including both creating backgrounds and rendering texts. Specifically, a background generator is developed to produce high-fidelity and text-free natural images. Moreover, a text renderer named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media
MethodsFocus · Diffusion
