DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Duolin Sun; Meixiu Long; Dan Yang; Junjie Wang; Yecheng Luo; Yue Shen; Jian Wang; Hualei Zhou; Chunxiao Guo; Peng Wei; Jiahai Wang; Jinjie Gu

arXiv:2508.07995·cs.IR·April 3, 2026

DIVER: A Multi-Stage Approach for Reasoning-intensive Information Retrieval

Duolin Sun, Meixiu Long, Dan Yang, Junjie Wang, Yecheng Luo, Yue Shen, Jian Wang, Hualei Zhou, Chunxiao Guo, Peng Wei, Jiahai Wang, Jinjie Gu

PDF

5 Models

TL;DR

DIVER is a multi-stage retrieval pipeline designed to improve reasoning-intensive information retrieval by enhancing query understanding, employing reasoning-aware retrieval models, and sophisticated reranking, achieving state-of-the-art results on the BRIGHT benchmark.

Contribution

The paper introduces DIVER, a novel multi-stage retrieval system tailored for reasoning-intensive tasks, combining query expansion, reasoning-aware retrieval, and advanced reranking.

Findings

01

DIVER achieves state-of-the-art nDCG@10 scores of 46.8 overall.

02

It outperforms existing reasoning-aware models on the BRIGHT benchmark.

03

The approach effectively handles complex, reasoning-based queries.

Abstract

Retrieval-augmented generation has achieved strong performance on knowledge-intensive tasks where query-document relevance can be identified through direct lexical or semantic matches. However, many real-world queries involve abstract reasoning, analogical thinking, or multi-step inference, which existing retrievers often struggle to capture. To address this challenge, we present DIVER, a retrieval pipeline designed for reasoning-intensive information retrieval. It consists of four components. The document preprocessing stage enhances readability and preserves content by cleaning noisy texts and segmenting long documents. The query expansion stage leverages large language models to iteratively refine user queries with explicit reasoning and evidence from retrieved documents. The retrieval stage employs a model fine-tuned on synthetic data spanning medical and mathematical domains, along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.