ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models
Benjamin Clavi\'e, Florian Brand

TL;DR
ReadBench is a new benchmark designed to evaluate vision-language models' ability to read and understand text-rich images, revealing significant performance gaps in handling extensive textual content.
Contribution
This paper introduces ReadBench, the first benchmark specifically assessing VLMs' reading comprehension of text-rich images, highlighting current limitations and areas for improvement.
Findings
VLMs show minimal performance drop on short text-image inputs
Performance declines sharply with longer, multi-page contexts
Text resolution has little impact on model performance
Abstract
Recent advancements in Large Vision-Language Models (VLMs), have greatly enhanced their capability to jointly process text and images. However, despite extensive benchmarks evaluating visual comprehension (e.g., diagrams, color schemes, OCR tasks...), there is limited assessment of VLMs' ability to read and reason about text-rich images effectively. To fill this gap, we introduce ReadBench, a multimodal benchmark specifically designed to evaluate the reading comprehension capabilities of VLMs. ReadBench transposes contexts from established text-only benchmarks into images of text while keeping textual prompts and questions intact. Evaluating leading VLMs with ReadBench, we find minimal-but-present performance degradation on short, text-image inputs, while performance sharply declines for longer, multi-page contexts. Our experiments further reveal that text resolution has negligible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Multimodal Machine Learning Applications · Visual and Cognitive Learning Processes
