TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun, Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang, Li, Zhoujun Li

TL;DR
This paper introduces TableBench, a comprehensive benchmark for table question answering that highlights the gap between current LLM capabilities and real-world industrial requirements, and presents TableLLM trained on a new dataset.
Contribution
The paper proposes a new complex benchmark, TableBench, and a specialized model, TableLLM, to better evaluate and improve LLM performance on real-world table QA tasks.
Findings
GPT-4 outperforms other models but still lags behind humans.
Current LLMs show significant room for improvement on real-world table QA.
TableLLM achieves comparable performance to GPT-3.5 on TableBench.
Abstract
Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities. Despite these achievements, LLMs still encounter significant challenges when applied in industrial scenarios, particularly due to the increased complexity of reasoning required with real-world tabular data, underscoring a notable disparity between academic benchmarks and practical applications. To address this discrepancy, we conduct a detailed investigation into the application of tabular data in industrial scenarios and propose a comprehensive and complex benchmark TableBench, including 18 fields within four major categories of table question answering (TableQA) capabilities. Furthermore, we introduce TableLLM, trained on our meticulously constructed training set TableInstruct, achieving comparable performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Cosine Annealing · Weight Decay · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Linear Layer
