STRICT: Stress Test of Rendering Images Containing Text

Tianyu Zhang; Xinyu Wang; Lu Li; Zhenghan Tai; Jijun Chi; Jingrui Tian; Hailin He; Suyuchen Wang

arXiv:2505.18985·cs.LG·September 16, 2025

STRICT: Stress Test of Rendering Images Containing Text

Tianyu Zhang, Xinyu Wang, Lu Li, Zhenghan Tai, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces STRICT, a benchmark for evaluating diffusion models' ability to generate coherent, legible, and instruction-following text within images, revealing persistent limitations in current models.

Contribution

The paper presents a systematic benchmark to stress-test diffusion models' text rendering capabilities and analyzes their limitations in long-range consistency and instruction adherence.

Findings

01

Models struggle with long-range text coherence.

02

Persistent issues with text legibility and correctness.

03

Limitations in following complex text instructions.

Abstract

While diffusion models have revolutionized text-to-image generation with their ability to synthesize realistic and diverse scenes, they continue to struggle to generate consistent and legible text within images. This shortcoming is commonly attributed to the locality bias inherent in diffusion-based generation, which limits their ability to model long-range spatial dependencies. In this paper, we introduce $STRICT$ , a benchmark designed to systematically stress-test the ability of diffusion models to render coherent and instruction-aligned text in images. Our benchmark evaluates models across multiple dimensions: (1) the maximum length of readable text that can be generated; (2) the correctness and legibility of the generated text, and (3) the ratio of not following instructions for generating text. We evaluate several state-of-the-art models, including proprietary and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tianyu-z/strict-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques

MethodsDiffusion