ChartAB: A Benchmark for Chart Grounding & Dense Alignment
Aniruddh Bansal, Davit Soselia, Dang Nguyen, Tianyi Zhou

TL;DR
This paper introduces ChartAB, a comprehensive benchmark for evaluating vision-language models' ability to understand and compare charts, revealing their strengths and weaknesses in fine-grained chart perception.
Contribution
The paper presents a new benchmark with tailored evaluation metrics and a two-stage inference workflow for assessing chart grounding and comparison capabilities of VLMs.
Findings
VLMs exhibit perception biases and hallucinations in chart understanding.
Current models show weaknesses in fine-grained chart element extraction.
Evaluation reveals discrepancies in VLMs' ability to compare and align chart features.
Abstract
Charts play an important role in visualization, reasoning, data analysis, and the exchange of ideas among humans. However, existing vision-language models (VLMs) still lack accurate perception of details and struggle to extract fine-grained structures from charts. Such limitations in chart grounding also hinder their ability to compare multiple charts and reason over them. In this paper, we introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, localizing visualization elements, and recognizing various attributes from charts of diverse types and complexities. We design a JSON template to facilitate the calculation of evaluation metrics specifically tailored for each grounding task. By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs capability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
