TL;DR
This paper evaluates and compares three retrieval architectures—Vector RAG, Tree Reasoning, and a new Adaptive Hybrid Retrieval—across financial, legal, and medical documents, highlighting their strengths and limitations for different query types.
Contribution
It introduces the Adaptive Hybrid Retrieval framework and a four-tier query benchmark, demonstrating how different retrieval methods excel at various query complexities.
Findings
Tree Reasoning achieves the highest overall score (0.900).
Hybrid AHR outperforms on cross-reference and multi-section queries.
Cross-reference recall is 100% for tree-based and hybrid approaches.
Abstract
Retrieval-Augmented Generation (RAG) has become the standard paradigm for grounding Large Language Model outputs in external knowledge. Lumer et al. [1] presented the first systematic evaluation comparing vector-based agentic RAG against hierarchical node-based reasoning systems for financial document QA across 1,200 SEC filings, finding vector-based systems achieved a 68% win rate. Concurrently, the PageIndex framework [2] demonstrated 98.7% accuracy on FinanceBench through purely reasoning-based retrieval. This paper extends their work by: (i) implementing and evaluating three retrieval architectures: Vector RAG, Tree Reasoning, and the proposed Adaptive Hybrid Retrieval (AHR) across financial, legal, and medical domains; (ii) introducing a four-tier query complexity benchmark; and (iii) employing GPT-4-powered LLM-as-judge evaluation. Experiments reveal that Tree Reasoning achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
