GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing

Tong Wang; Ting Liu; Xiaochao Qu; Chengjing Wu; Luoqi Liu; Xiaolin Hu

arXiv:2505.04915·cs.CV·May 9, 2025

GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing

Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu

PDF

Open Access

TL;DR

GlyphMastero introduces a glyph encoder with cross-level and multi-scale modeling to enhance high-fidelity scene text editing, especially for complex characters, by guiding diffusion models with stroke-level precision.

Contribution

It presents a novel glyph attention module and feature pyramid network to better capture hierarchical text structures for improved scene text editing.

Findings

01

Achieves 18.02% higher sentence accuracy than previous methods.

02

Reduces text-region Fréchet inception distance by 53.28%.

03

Enhances stroke-level precision in multi-lingual scene text editing.

Abstract

Scene text editing, a subfield of image editing, requires modifying texts in images while preserving style consistency and visual coherence with the surrounding environment. While diffusion-based methods have shown promise in text generation, they still struggle to produce high-quality results. These methods often generate distorted or unrecognizable characters, particularly when dealing with complex characters like Chinese. In such systems, characters are composed of intricate stroke patterns and spatial relationships that must be precisely maintained. We present GlyphMastero, a specialized glyph encoder designed to guide the latent diffusion model for generating texts with stroke-level precision. Our key insight is that existing methods, despite using pretrained OCR models for feature extraction, fail to capture the hierarchical nature of text structures - from individual strokes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques

MethodsSoftmax · Attention Is All You Need · Latent Diffusion Model · Diffusion