TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering
Junnan Zhu, Jingyi Wang, Bohan Yu, Xiaoyu Wu, Junbo Li, Lei Wang, Nan Xu

TL;DR
TableEval is a comprehensive benchmark designed to evaluate large language models on complex, multilingual, and multi-structured table question answering tasks, addressing real-world challenges and providing a new evaluation framework.
Contribution
The paper introduces TableEval, a realistic, multi-domain, multilingual TableQA benchmark with a novel semantic accuracy metric, SEAT, to better assess LLM performance on complex table reasoning.
Findings
State-of-the-art LLMs show significant gaps in complex TableQA tasks.
SEAT correlates highly with human judgment, improving evaluation accuracy.
Tables from diverse domains and languages reveal limitations of current models.
Abstract
LLMs have shown impressive progress in natural language processing. However, they still face significant challenges in TableQA, where real-world complexities such as diverse table structures, multilingual data, and domain-specific reasoning are crucial. Existing TableQA benchmarks are often limited by their focus on simple flat tables and suffer from data leakage. Furthermore, most benchmarks are monolingual and fail to capture the cross-lingual and cross-domain variability in practical applications. To address these limitations, we introduce TableEval, a new benchmark designed to evaluate LLMs on realistic TableQA tasks. Specifically, TableEval includes tables with various structures (such as concise, hierarchical, and nested tables) collected from four domains (including government, finance, academia, and industry reports). Besides, TableEval features cross-lingual scenarios with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Text Readability and Simplification
