TableBench: A Comprehensive and Complex Benchmark for Table Question   Answering

Xianjie Wu; Jian Yang; Linzheng Chai; Ge Zhang; Jiaheng Liu; Xinrun; Du; Di Liang; Daixin Shu; Xianfu Cheng; Tianzhen Sun; Guanglin Niu; Tongliang; Li; Zhoujun Li

arXiv:2408.09174·cs.CL·March 19, 2025

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun, Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang, Li, Zhoujun Li

PDF

Open Access 3 Models 2 Datasets

TL;DR

This paper introduces TableBench, a comprehensive benchmark for table question answering that highlights the gap between current LLM capabilities and real-world industrial requirements, and presents TableLLM trained on a new dataset.

Contribution

The paper proposes a new complex benchmark, TableBench, and a specialized model, TableLLM, to better evaluate and improve LLM performance on real-world table QA tasks.

Findings

01

GPT-4 outperforms other models but still lags behind humans.

02

Current LLMs show significant room for improvement on real-world table QA.

03

TableLLM achieves comparable performance to GPT-3.5 on TableBench.

Abstract

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Cosine Annealing · Weight Decay · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Linear Layer