TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table   Domains

Yoonsik Kim; Moonbin Yim; Ka Yeon Song

arXiv:2404.19205·cs.CV·May 1, 2024

TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

Yoonsik Kim, Moonbin Yim, Ka Yeon Song

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper introduces TableVQA-Bench, a new benchmark dataset for table visual question answering that combines images and QA pairs, and evaluates various multi-modal large language models on this challenging task.

Contribution

The paper creates the first comprehensive benchmark for table VQA with images and QA pairs, and analyzes the performance of different multi-modal models on it.

Findings

01

GPT-4V achieves the highest accuracy among tested models.

02

Performance is significantly affected by the number of vision queries.

03

Visual inputs are more challenging for models than text inputs.

Abstract

In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obtain these necessary components. Specifically, images are sourced either through the application of a \textit{stylesheet} or by employing the proposed table rendering system. QA pairs are generated by exploiting the large language model (LLM) where the input is a text-formatted table. Ultimately, the completed TableVQA-Bench comprises 1,500 QA pairs. We comprehensively compare the performance of various multi-modal large language models (MLLMs) on TableVQA-Bench. GPT-4V achieves the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/tablevqabench
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization