DreamText: High Fidelity Scene Text Synthesis

Yibin Wang; Weizhong Zhang; Honghui Xu; Cheng Jin

arXiv:2405.14701·cs.CV·March 25, 2025

DreamText: High Fidelity Scene Text Synthesis

Yibin Wang, Weizhong Zhang, Honghui Xu, Cheng Jin

PDF

Open Access 1 Repo

TL;DR

DreamText introduces a novel high-fidelity scene text synthesis method that refines diffusion training with character-level guidance and joint encoder-generator training, significantly improving text rendering quality in images.

Contribution

The paper proposes a hybrid optimization approach with joint training of text encoder and generator to enhance character-level accuracy and font diversity in scene text synthesis.

Findings

01

Outperforms state-of-the-art methods qualitatively.

02

Achieves higher accuracy in character rendering.

03

Effectively handles diverse font styles.

Abstract

Scene text synthesis involves rendering specified texts onto arbitrary images. Current methods typically formulate this task in an end-to-end manner but lack effective character-level guidance during training. Besides, their text encoders, pre-trained on a single font type, struggle to adapt to the diverse font styles encountered in practical applications. Consequently, these methods suffer from character distortion, repetition, and absence, particularly in polystylistic scenarios. To this end, this paper proposes DreamText for high-fidelity scene text synthesis. Our key idea is to reconstruct the diffusion training process, introducing more refined guidance tailored to this task, to expose and rectify the model's attention at the character level and strengthen its learning of text regions. This transformation poses a hybrid optimization challenge, involving both discrete and continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codegoat24/dreamtext
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Music and Audio Processing · Human Motion and Animation

MethodsDiffusion