Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answering
Amine Kobeissi, Philippe Langlais

TL;DR
This paper investigates retrieval failures in financial long-document question answering, analyzing different retrieval granularities, introducing an oracle analysis, and proposing a domain-specific page scorer to improve retrieval accuracy.
Contribution
It systematically studies within-document retrieval failures, evaluates multiple retrieval strategies, and introduces a fine-tuned page scorer to enhance retrieval performance in financial QA.
Findings
Improved page and chunk retrieval with the new page scorer.
Oracle analysis reveals room for better page and chunk retrieval.
Diverse retrieval strategies show gains in document discovery and recall.
Abstract
Retrieval-augmented generation is increasingly used for financial question answering over long regulatory filings, yet reliability depends on retrieving the exact context needed to justify answers in high stakes settings. We study a frequent failure mode in which the correct document is retrieved but the page or chunk that contains the answer is missed, leading the generator to extrapolate from incomplete context. Despite its practical significance, this within-document retrieval failure mode has received limited systematic attention in the Financial Question Answering (QA) literature. We evaluate retrieval at multiple levels of granularity, document, page, and chunk level, and introduce an oracle based analysis to provide empirical upper bounds on retrieval and generative performance. On a 150 question subset of FinanceBench, we reproduce and compare diverse retrieval strategies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Text Analysis Techniques
