TL;DR
DIVER is a multi-stage retrieval pipeline designed to improve reasoning-intensive information retrieval by enhancing query understanding, employing reasoning-aware retrieval models, and sophisticated reranking, achieving state-of-the-art results on the BRIGHT benchmark.
Contribution
The paper introduces DIVER, a novel multi-stage retrieval system tailored for reasoning-intensive tasks, combining query expansion, reasoning-aware retrieval, and advanced reranking.
Findings
DIVER achieves state-of-the-art nDCG@10 scores of 46.8 overall.
It outperforms existing reasoning-aware models on the BRIGHT benchmark.
The approach effectively handles complex, reasoning-based queries.
Abstract
Retrieval-augmented generation has achieved strong performance on knowledge-intensive tasks where query-document relevance can be identified through direct lexical or semantic matches. However, many real-world queries involve abstract reasoning, analogical thinking, or multi-step inference, which existing retrievers often struggle to capture. To address this challenge, we present DIVER, a retrieval pipeline designed for reasoning-intensive information retrieval. It consists of four components. The document preprocessing stage enhances readability and preserves content by cleaning noisy texts and segmenting long documents. The query expansion stage leverages large language models to iteratively refine user queries with explicit reasoning and evidence from retrieved documents. The retrieval stage employs a model fine-tuned on synthetic data spanning medical and mathematical domains, along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
