TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu; Yuliang Liu; Xuecheng Wu; An-Lan Wang; Hao Feng; Dingkang Yang; Chao Feng; Can Huang; Jingqun Tang; Xiang Bai

arXiv:2602.20903·cs.CV·February 27, 2026

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

Hanshen Zhu, Yuliang Liu, Xuecheng Wu, An-Lan Wang, Hao Feng, Dingkang Yang, Chao Feng, Can Huang, Jingqun Tang, Xiang Bai

PDF

Open Access 5 Models 1 Datasets

TL;DR

This paper introduces TextPecker, a reinforcement learning strategy that improves the structural fidelity of text in generated images by perceptively identifying anomalies, significantly enhancing visual text rendering quality.

Contribution

We develop a novel RL-based method with a character-level anomaly dataset and stroke-editing engine to improve structural accuracy in text-to-image models.

Findings

01

Achieves 4% improvement in structural fidelity for Chinese text

02

Yields 8.7% better semantic alignment

03

Establishes new state-of-the-art in high-fidelity visual text rendering

Abstract

Visual Text Rendering (VTR) remains a critical challenge in text-to-image generation, where even advanced models frequently produce text with structural anomalies such as distortion, blurriness, and misalignment. However, we find that leading MLLMs and specialist OCR models largely fail to perceive these structural anomalies, creating a critical bottleneck for both VTR evaluation and RL-based optimization. As a result, even state-of-the-art generators (e.g., Seedream4.0, Qwen-Image) still struggle to render structurally faithful text. To address this, we propose TextPecker, a plug-and-play structural anomaly perceptive RL strategy that mitigates noisy reward signals and works with any textto-image generator. To enable this capability, we construct a recognition dataset with character-level structural-anomaly annotations and develop a stroke-editing synthesis engine to expand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

CIawevy/TextPecker-1.5M
dataset· 37k dl
37k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications