First Creating Backgrounds Then Rendering Texts: A New Paradigm for   Visual Text Blending

Zhenhang Li; Yan Shu; Weichao Zeng; Dongbao Yang; Yu Zhou

arXiv:2410.10168·cs.CV·October 15, 2024

First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending

Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new visual text blending paradigm that first creates high-quality backgrounds and then renders text onto them, improving control, diversity, and fidelity in text-image synthesis.

Contribution

It proposes a novel approach combining background generation and text rendering, including a background generator and the GlyphOnly renderer based on Stable Diffusion, with applications in scene text dataset synthesis and editing.

Findings

01

Generated high-fidelity, diverse backgrounds for text blending

02

Achieved visually plausible text-background integration with GlyphOnly

03

Enhanced downstream tasks like scene text detection and editing

Abstract

Diffusion models, known for their impressive image generation abilities, have played a pivotal role in the rise of visual text generation. Nevertheless, existing visual text generation methods often focus on generating entire images with text prompts, leading to imprecise control and limited practicality. A more promising direction is visual text blending, which focuses on seamlessly merging texts onto text-free backgrounds. However, existing visual text blending methods often struggle to generate high-fidelity and diverse images due to a shortage of backgrounds for synthesis and limited generalization capabilities. To overcome these challenges, we propose a new visual text blending paradigm including both creating backgrounds and rendering texts. Specifically, a background generator is developed to produce high-fidelity and text-free natural images. Moreover, a text renderer named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhenhang-Li/GlyphOnly
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media

MethodsFocus · Diffusion