Uncovering Limitations of Large Language Models in Information Seeking from Tables
Chaoxu Pang, Yixuan Cao, Chunhao Yang, Ping Luo

TL;DR
This paper evaluates large language models' ability to seek information from tables, introducing a new benchmark that reveals their limitations in understanding table structures and robustness, highlighting the need for improved models.
Contribution
It presents a reliable, question-based benchmark for Table Information Seeking and analyzes LLMs' performance, exposing their deficiencies in understanding tables and robustness issues.
Findings
GPT-4-turbo performs marginally well
Most models perform inadequately in TIS tasks
LLMs struggle with table structure comprehension
Abstract
Tables are recognized for their high information density and widespread usage, serving as essential sources of information. Seeking information from tables (TIS) is a crucial capability for Large Language Models (LLMs), serving as the foundation of knowledge-based Q&A systems. However, this field presently suffers from an absence of thorough and reliable evaluation. This paper introduces a more reliable benchmark for Table Information Seeking (TabIS). To avoid the unreliable evaluation caused by text similarity-based metrics, TabIS adopts a single-choice question format (with two options per question) instead of a text generation format. We establish an effective pipeline for generating options, ensuring their difficulty and quality. Experiments conducted on 12 LLMs reveal that while the performance of GPT-4-turbo is marginally satisfactory, both other proprietary and open-source models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Advanced Text Analysis Techniques
