From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents
Meftun Akarsu, Recep Kaan Karaman, and Christopher Mierbach

TL;DR
This paper systematically compares various retrieval strategies for heterogeneous text-and-table documents in financial QA, highlighting the effectiveness of hybrid retrieval and neural reranking.
Contribution
It introduces a comprehensive benchmark for retrieval methods on mixed content documents and reveals that hybrid retrieval with reranking outperforms single-stage approaches.
Findings
Hybrid retrieval with reranking achieves high recall and MRR.
BM25 outperforms dense retrieval on financial documents.
Query expansion offers limited benefits for numerical precision queries.
Abstract
Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text and tabular data. We benchmark ten retrieval strategies spanning sparse, dense, hybrid fusion, cross-encoder reranking, query expansion, index augmentation, and adaptive retrieval on a challenging financial QA benchmark of 23,088 queries over 7,318 documents with mixed text-and-table content. We evaluate retrieval quality via Recall@k, MRR, and nDCG, and end-to-end generation quality via Number Match, with paired bootstrap significance testing. Our results show that (1) a two-stage pipeline combining hybrid retrieval with neural reranking achieves Recall@5 of 0.816 and MRR@3 of 0.605, outperforming all single-stage methods by a large margin; (2) BM25 outperforms state-of-the-art dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
