T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation

Jan Strich; Enes Kutay Isgorur; Maximilian Trescher; Chris Biemann; Martin Semmann

arXiv:2506.12071·cs.IR·January 19, 2026

T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation

Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann

PDF

Open Access 2 Datasets

TL;DR

This paper introduces $T^2$-RAGBench, a large-scale benchmark for evaluating retrieval-augmented generation models on real-world text-and-table data, emphasizing the importance of retrieval accuracy for complex reasoning tasks.

Contribution

The paper presents a new benchmark with 23,088 question-context-answer triples for evaluating RAG systems on text-and-table data, including a transformation of datasets into context-independent questions for reliable assessment.

Findings

01

Hybrid BM25 is the most effective retrieval approach for text-and-table data.

02

Current SOTA models still find $T^2$-RAGBench challenging.

03

Embedding models and corpus size significantly impact retrieval performance.

Abstract

Since many real-world documents combine textual and tabular data, robust Retrieval Augmented Generation (RAG) systems are essential for effectively accessing and analyzing such content to support complex reasoning tasks. Therefore, this paper introduces $\textbf{$ T^2 $-RAGBench}$ , a benchmark comprising $23,088$ question-context-answer triples, designed to evaluate RAG methods on real-world text-and-table data. Unlike typical QA datasets that operate under $Oracle Context$ settings, $\textbf{$ T^2 $-RAGBench}$ challenges models to first retrieve the correct context before conducting numerical reasoning. Existing QA datasets containing text-and-table data typically contain context-dependent questions, which may yield multiple correct answers depending on the provided context. To address this, we transform SOTA datasets into a context-independent format, validated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques