Metadata-Driven Retrieval-Augmented Generation for Financial Question Answering
Michail Dadopoulos, Anestis Ladas, Stratos Moschidis, Ioannis Negkakis

TL;DR
This paper develops a metadata-driven Retrieval-Augmented Generation system tailored for financial document question answering, demonstrating that contextual embeddings and a multi-stage architecture significantly improve retrieval accuracy and efficiency.
Contribution
It introduces a novel multi-stage RAG architecture utilizing LLM-generated metadata and contextual embeddings, advancing retrieval precision in financial QA tasks.
Findings
Embedding metadata with text improves retrieval performance.
A powerful reranker enhances precision significantly.
The proposed system offers a cost-effective alternative to commercial solutions.
Abstract
Retrieval-Augmented Generation (RAG) struggles on long, structured financial filings where relevant evidence is sparse and cross-referenced. This paper presents a systematic investigation of advanced metadata-driven Retrieval-Augmented Generation (RAG) techniques, proposing and evaluating a novel, multi-stage RAG architecture that leverages LLM-generated metadata. We introduce a sophisticated indexing pipeline to create contextually rich document chunks and benchmark a spectrum of enhancements, including pre-retrieval filtering, post-retrieval reranking, and enriched embeddings, benchmarked on the FinanceBench dataset. Our results reveal that while a powerful reranker is essential for precision, the most significant performance gains come from embedding chunk metadata directly with text ("contextual chunks"). Our proposed optimal architecture combines LLM-driven pre-retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
