T$^2$-RAGBench: Text-and-Table Benchmark for Evaluating Retrieval-Augmented Generation
Jan Strich, Enes Kutay Isgorur, Maximilian Trescher, Chris Biemann, Martin Semmann

TL;DR
This paper introduces $T^2$-RAGBench, a large-scale benchmark for evaluating retrieval-augmented generation models on real-world text-and-table data, emphasizing the importance of retrieval accuracy for complex reasoning tasks.
Contribution
The paper presents a new benchmark with 23,088 question-context-answer triples for evaluating RAG systems on text-and-table data, including a transformation of datasets into context-independent questions for reliable assessment.
Findings
Hybrid BM25 is the most effective retrieval approach for text-and-table data.
Current SOTA models still find $T^2$-RAGBench challenging.
Embedding models and corpus size significantly impact retrieval performance.
Abstract
Since many real-world documents combine textual and tabular data, robust Retrieval Augmented Generation (RAG) systems are essential for effectively accessing and analyzing such content to support complex reasoning tasks. Therefore, this paper introduces \textbf{T^2-RAGBench}, a benchmark comprising question-context-answer triples, designed to evaluate RAG methods on real-world text-and-table data. Unlike typical QA datasets that operate under settings, \textbf{T^2-RAGBench} challenges models to first retrieve the correct context before conducting numerical reasoning. Existing QA datasets containing text-and-table data typically contain context-dependent questions, which may yield multiple correct answers depending on the provided context. To address this, we transform SOTA datasets into a context-independent format, validated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
