Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Fenfen Lin, Yesheng Liu, Haiyu Xu, Chen Yue, Zheqi He, Mingxuan Zhao, Miguel Hu Chen, Jiakang Liu, JG Yao, Xi Yang

TL;DR
This paper introduces MeasureBench, a comprehensive benchmark for evaluating vision-language models on visual measurement reading tasks, revealing current models' limitations and exploring reinforcement finetuning improvements.
Contribution
The work provides a new benchmark with a scalable data synthesis pipeline and demonstrates the challenges and potential improvements for VLMs in precise spatial and numeracy tasks.
Findings
VLMs struggle with measurement reading tasks.
Reinforcement finetuning improves performance on synthetic and real images.
Fundamental limitations in fine-grained spatial grounding of VLMs.
Abstract
Reading measurement instruments is effortless for humans and requires relatively little domain expertise, yet it remains surprisingly challenging for current vision-language models (VLMs) as we find in preliminary evaluation. In this work, we introduce MeasureBench, a benchmark on visual measurement reading covering both real-world and synthesized images of various types of measurements, along with an extensible pipeline for data synthesis. Our pipeline procedurally generates a specified type of gauge with controllable visual appearance, enabling scalable variation in key details such as pointers, scales, fonts, lighting, and clutter. Evaluation on popular proprietary and open-weight VLMs shows that even the strongest frontier VLMs struggle with measurement reading in general. We have also conducted preliminary experiments with reinforcement finetuning (RFT) over synthetic data, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Data Visualization and Analytics
