TextMastero: Mastering High-Quality Scene Text Editing in Diverse   Languages and Styles

Tong Wang; Xiaochao Qu; Ting Liu

arXiv:2408.10623·cs.CV·August 21, 2024

TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles

Tong Wang, Xiaochao Qu, Ting Liu

PDF

Open Access

TL;DR

TextMastero is a novel multilingual scene text editing framework based on latent diffusion models that significantly improves text accuracy and style preservation, especially for complex scripts like CJK characters.

Contribution

It introduces glyph conditioning and latent guidance modules to enhance text fidelity and style consistency in scene text editing across diverse languages and styles.

Findings

01

Outperforms existing methods in text fidelity.

02

Achieves superior style similarity in edited images.

03

Handles complex scripts like CJK effectively.

Abstract

Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original. Given an image, a target area, and target text, the task produces an output image with the target text in the selected area, replacing the original. This task has been studied extensively, with initial success using Generative Adversarial Networks (GANs) to balance text fidelity and style similarity. However, GAN-based methods struggled with complex backgrounds or text styles. Recent works leverage diffusion models, showing improved results, yet still face challenges, especially with non-Latin languages like CJK characters (Chinese, Japanese, Korean) that have complex glyphs, often producing inaccurate or unrecognizable characters. To address these issues, we present \emph{TextMastero} - a carefully designed multilingual scene text editing architecture based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsDiffusion