ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution

Yahia Alqurnawi; Preetom Biswas; Anmol Rao; Tejas Anvekar; Chitta Baral; Vivek Gupta

arXiv:2602.15769·cs.CL·February 18, 2026

ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution

Yahia Alqurnawi, Preetom Biswas, Anmol Rao, Tejas Anvekar, Chitta Baral, Vivek Gupta

PDF

Open Access

TL;DR

This paper evaluates multimodal large language models' ability to attribute answers to specific parts of structured data like tables, revealing significant gaps in attribution accuracy and reliability across formats and models.

Contribution

It introduces a systematic evaluation of structured data attribution in mLLMs, highlighting their current limitations in providing trustworthy evidence support.

Findings

01

Attribution accuracy is near random for JSON inputs.

02

Models are more reliable at citing rows than columns.

03

Performance varies significantly across model families.

Abstract

Multimodal Large Language Models (mLLMs) are often used to answer questions in structured data such as tables in Markdown, JSON, and images. While these models can often give correct answers, users also need to know where those answers come from. In this work, we study structured data attribution/citation, which is the ability of the models to point to the specific rows and columns that support an answer. We evaluate several mLLMs across different table formats and prompting strategies. Our results show a clear gap between question answering and evidence attribution. Although question answering accuracy remains moderate, attribution accuracy is much lower, near random for JSON inputs, across all models. We also find that models are more reliable at citing rows than columns, and struggle more with textual formats than images. Finally, we observe notable differences across model families.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Computational and Text Analysis Methods