FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan, Yancheng Bai, Xu Duan, Mingxing Li, Dongyang Jin, Ryan Xu, Dong Nie, Lei Sun, Xiangxiang Chu

TL;DR
FLUX-Text introduces a lightweight, multilingual diffusion transformer for scene text editing that significantly improves glyph understanding and reduces training data requirements while maintaining high visual quality.
Contribution
The paper presents FLUX-Text, a novel diffusion transformer model with lightweight modules and a regional perceptual loss, enabling effective multilingual scene text editing with minimal training data.
Findings
Outperforms existing methods in visual quality and text fidelity.
Requires only 0.1M training examples, a 97% reduction from previous methods.
Effective on English and Chinese benchmarks.
Abstract
Scene text editing aims to modify or add texts on images while ensuring text fidelity and overall visual quality consistent with the background. Recent methods are primarily built on UNet-based diffusion models, which have improved scene text editing results, but still struggle with complex glyph structures, especially for non-Latin ones (\eg, Chinese, Korean, Japanese). To address these issues, we present \textbf{FLUX-Text}, a simple and advanced multilingual scene text editing DiT method. Specifically, our FLUX-Text enhances glyph understanding and generation through lightweight Visual and Text Embedding Modules, while preserving the original generative capability of FLUX. We further propose a Regional Text Perceptual Loss tailored for text regions, along with a matching two-stage training strategy to better balance text editing and overall image quality. Benefiting from the DiT-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Power Systems and Technologies
MethodsDiffusion
