EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts

Kushin Mukherjee; Donghao Ren; Dominik Moritz; Yannick Assogba

arXiv:2508.04650·cs.CV·December 9, 2025

EncQA: Benchmarking Vision-Language Models on Visual Encodings for Charts

Kushin Mukherjee, Donghao Ren, Dominik Moritz, Yannick Assogba

PDF

TL;DR

EncQA introduces a comprehensive benchmark for evaluating vision-language models on diverse visual encodings and tasks related to chart understanding, revealing gaps in current models' reasoning capabilities.

Contribution

We present EncQA, a new benchmark with synthetic question-answer pairs covering multiple visual encodings and tasks, highlighting the limitations of current models in visual reasoning for charts.

Findings

01

Model performance varies across encodings and tasks.

02

Scaling model size does not always improve performance.

03

Targeted strategies are needed to address reasoning gaps.

Abstract

Multimodal vision-language models (VLMs) continue to achieve ever-improving scores on chart understanding benchmarks. Yet, we find that this progress does not fully capture the breadth of visual reasoning capabilities essential for interpreting charts. We introduce EncQA, a novel benchmark informed by the visualization literature, designed to provide systematic coverage of visual encodings and analytic tasks that are crucial for chart understanding. EncQA provides 2,076 synthetic question-answer pairs, enabling balanced coverage of six visual encoding channels (position, length, area, color quantitative, color nominal, and shape) and eight tasks (find extrema, retrieve value, find anomaly, filter values, compute derived value exact, compute derived value relative, correlate values, and correlate values relative). Our evaluation of 9 state-of-the-art VLMs reveals that performance varies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.