Hierarchical Retrieval with Evidence Curation for Open-Domain Financial Question Answering on Standardized Documents
Jaeyoung Choe, Jihoon Kim, Woohwan Jung

TL;DR
This paper introduces HiREC, a hierarchical retrieval and evidence curation framework designed to improve open-domain financial question answering by effectively handling standardized documents with repetitive structures.
Contribution
The paper proposes a novel hierarchical retrieval and evidence curation method tailored for financial documents, addressing duplicate retrieval issues in RAG models.
Findings
Improved accuracy in financial QA tasks.
Constructed a large-scale financial QA benchmark.
Demonstrated effectiveness of hierarchical retrieval.
Abstract
Retrieval-augmented generation (RAG) based large language models (LLMs) are widely used in finance for their excellent performance on knowledge-intensive tasks. However, standardized documents (e.g., SEC filing) share similar formats such as repetitive boilerplate texts, and similar table structures. This similarity forces traditional RAG methods to misidentify near-duplicate text, leading to duplicate retrieval that undermines accuracy and completeness. To address these issues, we propose the Hierarchical Retrieval with Evidence Curation (HiREC) framework. Our approach first performs hierarchical retrieval to reduce confusion among similar texts. It first retrieve related documents and then selects the most relevant passages from the documents. The evidence curation process removes irrelevant passages. When necessary, it automatically generates complementary queries to collect missing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Information Retrieval and Search Behavior
