How Control Information Influences Multilingual Text Image Generation and Editing?
Boqiang Zhang, Zuan Gao, Yadong Qu, Hongtao Xie

TL;DR
This paper investigates how control information affects multilingual text image generation and editing, proposing a new framework that enhances quality by optimizing control features and employing a two-stage process.
Contribution
It introduces TextGen, a novel framework that improves multilingual text image generation by analyzing control information's role and optimizing input/output features with Fourier analysis.
Findings
Control information has unique characteristics compared to traditional inputs.
Control information plays different roles at various stages of denoising.
Output control features differ significantly in the frequency domain from base features.
Abstract
Visual text generation has significantly advanced through diffusion models aimed at producing images with readable and realistic text. Recent works primarily use a ControlNet-based framework, employing standard font text images to control diffusion models. Recognizing the critical role of control information in generating high-quality text, we investigate its influence from three perspectives: input encoding, role at different stages, and output features. Our findings reveal that: 1) Input control information has unique characteristics compared to conventional inputs like Canny edges and depth maps. 2) Control information plays distinct roles at different stages of the denoising process. 3) Output control features significantly differ from the base and skip features of the U-Net decoder in the frequency domain. Based on these insights, we propose TextGen, a novel framework designed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net · Diffusion · Balanced Selection · ALIGN
