TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
Yoonsik Kim, Moonbin Yim, Ka Yeon Song

TL;DR
This paper introduces TableVQA-Bench, a new benchmark dataset for table visual question answering that combines images and QA pairs, and evaluates various multi-modal large language models on this challenging task.
Contribution
The paper creates the first comprehensive benchmark for table VQA with images and QA pairs, and analyzes the performance of different multi-modal models on it.
Findings
GPT-4V achieves the highest accuracy among tested models.
Performance is significantly affected by the number of vision queries.
Visual inputs are more challenging for models than text inputs.
Abstract
In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
