UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang; Cong Han; Yafei Li; Zhipeng Jin; Xiawei Li; SiNan Du; Wen Tao; Yi Yang; Shuanglong Li; Chun Yuan; Liu Lin

arXiv:2507.00992·cs.CV·July 3, 2025

UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

Yuanrui Wang, Cong Han, Yafei Li, Zhipeng Jin, Xiawei Li, SiNan Du, Wen Tao, Yi Yang, Shuanglong Li, Chun Yuan, Liu Lin

PDF

Open Access

TL;DR

This paper introduces UniGlyph, a segmentation-guided diffusion framework that uses pixel-level text masks to improve the accuracy and style fidelity of visual text synthesis in images, surpassing previous methods.

Contribution

The paper presents a novel unified conditional input using pixel-level text masks and a diffusion model with adaptive glyph conditioning, achieving state-of-the-art results in text-to-image synthesis.

Findings

01

Outperforms prior methods on the AnyText benchmark

02

Excels at small text rendering and complex layout preservation

03

Introduces new benchmarks for layout and style evaluation

Abstract

Text-to-image generation has greatly advanced content creation, yet accurately rendering visual text remains a key challenge due to blurred glyphs, semantic drift, and limited style control. Existing methods often rely on pre-rendered glyph images as conditions, but these struggle to retain original font styles and color cues, necessitating complex multi-branch designs that increase model overhead and reduce flexibility. To address these issues, we propose a segmentation-guided framework that uses pixel-level visual text masks -- rich in glyph shape, color, and spatial detail -- as unified conditional inputs. Our method introduces two core components: (1) a fine-tuned bilingual segmentation model for precise text mask extraction, and (2) a streamlined diffusion model augmented with adaptive glyph conditioning and a region-specific loss to preserve textual fidelity in both content and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsDiffusion