TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with   Scalable Context and Symbolic Extension

Zipeng Qiu; You Peng; Guangxin He; Binhang Yuan; Chen Wang

arXiv:2411.19504·cs.AI·December 2, 2024

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension

Zipeng Qiu, You Peng, Guangxin He, Binhang Yuan, Chen Wang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

TQA-Bench is a comprehensive benchmark designed to evaluate large language models' ability to perform complex question answering over multi-table relational data, incorporating real-world datasets, scalable contexts, and symbolic reasoning extensions.

Contribution

We introduce TQA-Bench, a novel multi-table QA benchmark with scalable contexts and symbolic reasoning, addressing the limitations of existing single-table focused benchmarks.

Findings

01

LLMs show varying performance on multi-table QA tasks.

02

Symbolic extensions improve reasoning capabilities.

03

Larger models generally perform better on complex multi-table questions.

Abstract

The advent of large language models (LLMs) has unlocked great opportunities in complex data management tasks, particularly in question answering (QA) over complicated multi-table relational data. Despite significant progress, systematically evaluating LLMs on multi-table QA remains a critical challenge due to the inherent complexity of analyzing heterogeneous table structures and potential large scale of serialized relational data. Existing benchmarks primarily focus on single-table QA, failing to capture the intricacies of reasoning across multiple relational tables, as required in real-world domains such as finance, healthcare, and e-commerce. To address this gap, we present TQA-Bench, a new multi-table QA benchmark designed to evaluate the capabilities of LLMs in tackling complex QA tasks over relational data. Our benchmark incorporates diverse relational database instances sourced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

relaxed-system-lab/tqa-bench
noneOfficial

Datasets

trl-lab/tabular-reasoning
dataset· 29 dl
29 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Advanced Text Analysis Techniques

MethodsFocus