HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models

Shuhan Zhuang; Mengqi Huang; Fengyi Fu; Nan Chen; Bohan Lei; Zhendong Mao

arXiv:2505.06543·cs.CV·May 13, 2025

HDGlyph: A Hierarchical Disentangled Glyph-Based Framework for Long-Tail Text Rendering in Diffusion Models

Shuhan Zhuang, Mengqi Huang, Fengyi Fu, Nan Chen, Bohan Lei, Zhendong Mao

PDF

Open Access

TL;DR

HDGlyph introduces a hierarchical, disentangled framework that significantly improves long-tail text rendering in diffusion models, especially for unseen and small-sized text, by separating text generation from background synthesis.

Contribution

The paper presents a novel hierarchical disentangled glyph-based framework (HDGlyph) that enhances long-tail text rendering in diffusion models through joint optimization and advanced inference techniques.

Findings

01

Achieves 5.08% and 11.7% accuracy improvements in English and Chinese text rendering.

02

Outperforms existing methods in long-tail scenarios with better accuracy and visual quality.

03

Maintains high image quality while improving text rendering robustness.

Abstract

Visual text rendering, which aims to accurately integrate specified textual content within generated images, is critical for various applications such as commercial design. Despite recent advances, current methods struggle with long-tail text cases, particularly when handling unseen or small-sized text. In this work, we propose a novel Hierarchical Disentangled Glyph-Based framework (HDGlyph) that hierarchically decouples text generation from non-text visual synthesis, enabling joint optimization of both common and long-tail text rendering. At the training stage, HDGlyph disentangles pixel-level representations via the Multi-Linguistic GlyphNet and the Glyph-Aware Perceptual Loss, ensuring robust rendering even for unseen characters. At inference time, HDGlyph applies Noise-Disentangled Classifier-Free Guidance and Latent-Disentangled Two-Stage Rendering (LD-TSR) scheme, which refines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis