TL;DR
This paper introduces RADAR, a framework that enhances the interpretability of multimodal large language models in visual data analysis by providing region-specific attributions in charts, thereby improving trust and answer accuracy.
Contribution
RADAR is the first semi-automatic method to generate attribution datasets and improve reasoning-based attribution in chart analysis models.
Findings
Attribution accuracy improved by 15% over baselines.
Model answers achieved an average BERTScore of ~0.90.
Created a dataset with 17,819 samples for attribution evaluation.
Abstract
Data visualizations like charts are fundamental tools for quantitative analysis and decision-making across fields, requiring accurate interpretation and mathematical reasoning. The emergence of Multimodal Large Language Models (MLLMs) offers promising capabilities for automated visual data analysis, such as processing charts, answering questions, and generating summaries. However, they provide no visibility into which parts of the visual data informed their conclusions; this black-box nature poses significant challenges to real-world trust and adoption. In this paper, we take the first major step towards evaluating and enhancing the capabilities of MLLMs to attribute their reasoning process by highlighting the specific regions in charts and graphs that justify model answers. To this end, we contribute RADAR, a semi-automatic approach to obtain a benchmark dataset comprising 17,819…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
