When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
Dongxin Guo, Jikun Wu, Siu Ming Yiu

TL;DR
ReaLM-Retrieve is a novel framework that dynamically integrates retrieval during multi-step reasoning, significantly improving accuracy and efficiency in large reasoning models.
Contribution
It introduces a step-level uncertainty detector, a learned retrieval intervention policy, and an efficient integration mechanism for reasoning-aware retrieval.
Findings
Achieves 10.1% absolute improvement in answer F1 over standard RAG.
Reduces retrieval calls by 47% compared to fixed-interval approaches.
Improves retrieval quality with 81.3% Recall@5, establishing new efficiency-accuracy trade-offs.
Abstract
Large reasoning models such as DeepSeek-R1 and OpenAI o1 generate extended chains of thought spanning thousands of tokens, yet their integration with retrieval-augmented generation (RAG) remains fundamentally misaligned. Current RAG systems optimize for providing context before reasoning begins, while reasoning models require evidence injection during multi-step inference chains. We introduce ReaLM-Retrieve, a reasoning-aware retrieval framework that addresses this mismatch through three key innovations: (1) a step-level uncertainty detector that identifies knowledge gaps at reasoning-step granularity rather than token or sentence level; (2) a retrieval intervention policy that learns when external evidence maximally benefits ongoing reasoning; and (3) an efficiency-optimized integration mechanism that reduces per-retrieval overhead by 3.2x compared to naive integration. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
