ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models

Benjamin Clavi\'e; Florian Brand

arXiv:2505.19091·cs.CL·May 27, 2025

ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models

Benjamin Clavi\'e, Florian Brand

PDF

Open Access 1 Repo

TL;DR

ReadBench is a new benchmark designed to evaluate vision-language models' ability to read and understand text-rich images, revealing significant performance gaps in handling extensive textual content.

Contribution

This paper introduces ReadBench, the first benchmark specifically assessing VLMs' reading comprehension of text-rich images, highlighting current limitations and areas for improvement.

Findings

01

VLMs show minimal performance drop on short text-image inputs

02

Performance declines sharply with longer, multi-page contexts

03

Text resolution has little impact on model performance

Abstract

Recent advancements in Large Vision-Language Models (VLMs), have greatly enhanced their capability to jointly process text and images. However, despite extensive benchmarks evaluating visual comprehension (e.g., diagrams, color schemes, OCR tasks...), there is limited assessment of VLMs' ability to read and reason about text-rich images effectively. To fill this gap, we introduce ReadBench, a multimodal benchmark specifically designed to evaluate the reading comprehension capabilities of VLMs. ReadBench transposes contexts from established text-only benchmarks into images of text while keeping textual prompts and questions intact. Evaluating leading VLMs with ReadBench, we find minimal-but-present performance degradation on short, text-image inputs, while performance sharply declines for longer, multi-page contexts. Our experiments further reveal that text resolution has negligible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

answerdotai/readbench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Multimodal Machine Learning Applications · Visual and Cognitive Learning Processes