Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Lingjun Zhang; Xinyuan Chen; Yaohui Wang; Yue Lu; Yu Qiao

arXiv:2312.12232·cs.CV·December 20, 2023·1 cites

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces Diff-Text, a training-free diffusion-based framework capable of generating realistic multilingual scene text images from any language, improving text accuracy and image naturalness.

Contribution

The paper presents a novel multilingual scene text generation method using diffusion models with localized attention constraints and contrastive prompts, without requiring additional training.

Findings

01

Outperforms existing methods in text recognition accuracy

02

Enhances naturalness of foreground-background blending

03

Effective in generating multilingual scene text images

Abstract

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language. Our model outputs a photo-realistic image given a text of any language along with a textual description of a scene. The model leverages rendered sketch images as priors, thus arousing the potential multilingual-generation ability of the pre-trained Stable Diffusion. Based on the observation from the influence of the cross-attention map on object placement in generated images, we propose a localized attention constraint into the cross-attention layer to address the unreasonable positioning problem of scene text. Additionally, we introduce contrastive image-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ecnuljzhang/brush-your-text
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation

MethodsDiffusion