Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval
Yulong Hui, Chao Chen, Zhihang Fu, Yihao Liu, Jieping Ye, Huanchen Zhang

TL;DR
Interact-RAG introduces an active retrieval manipulation paradigm for LLMs, enabling fine-grained control and reasoning in information-seeking tasks, leading to superior performance over existing black-box retrieval methods.
Contribution
This work presents a novel interaction framework with a Corpus Interaction Engine, allowing LLMs to actively manipulate retrieval, and develops a reasoning-enhanced workflow for improved RAG performance.
Findings
Outperforms existing methods on six benchmarks
Enables zero-shot execution and interaction trajectory synthesis
Improves retrieval accuracy and task success rates
Abstract
Retrieval-Augmented Generation (RAG) has significantly enhanced LLMs by incorporating external information. However, prevailing agentic RAG approaches are constrained by a critical limitation: they treat the retrieval process as a black-box querying operation. This confines agents' actions to query issuing, hindering its ability to tackle complex information-seeking tasks. To address this, we introduce Interact-RAG, a new paradigm that elevates the LLM agent from a passive query issuer into an active manipulator of the retrieval process. We dismantle the black-box with a Corpus Interaction Engine, equipping the agent with a set of action primitives for fine-grained control over information retrieval. To further empower the agent on the entire RAG pipeline, we first develop a reasoning-enhanced workflow, which enables both zero-shot execution and the synthesis of interaction…
Peer Reviews
Decision·ICLR 2026 Poster
1. The questions posed by this paper are critical to the advancement of LLM agents. The core problem—that agents are "stuck" in inefficient query reformulation loops and lack fine-grained control over their tools —is a widely recognized and significant unsolved challenge in the field. This paper tackles this gap head-on, addressing the fundamental limitations of the agent-retriever interface. 2. The paper is exceptionally rigorous. The design of the Interact-RAG-Workflow as a dual-purpose system
1. This is the paper's most significant weakness. The authors admit in Appendix C.3 that the standard 2018 Wikipedia dump has "mismatches" and "missing evidence," leading them to "construct a more faithful corpus". This is a major methodological decision that clouds the results. The paper does not explicitly state that the baselines (Search-R1, R-Search, etc.) were re-evaluated on this new "faithful corpus." If they were not, the 22.5% gain and SOTA claims are confounded, as the comparison would
1. This paper has practical significance in addressing a key limitation in agentic RAG—namely, the lack of retrieval control—making it potentially useful for real-world systems. 2. The paper is clearly written and easy to follow, with well-structured explanations that make the methodology and experiments accessible.
1. The proposed Corpus Interaction Engine appears incremental rather than fundamentally novel; it mainly uses additional retrieval modes. This does not fully realize the claim in intro section: “transforming the agent from a passive query issuer to an active participant,”. 2. Missing fine-grained ablations: Ablation studies for the specific components (Multi-Faceted Retrieval, Entity Match, Adjust Scale, and Doc Shaping) are absent, making it unclear which modules drive the observed performanc
Originality: Proposes a corpus interaction engine with actionable primitives (multi-faceted retrieval, anchored matching, context shaping) and a hierarchical reasoning workflow (planner–reasoner–executor), moving beyond black-box query reformulation. The integration of training-free trajectory synthesis with SFT+RL is thoughtfully designed. Quality: Empirical evaluation is thorough, covering diverse datasets, strong baselines, and ablations that isolate each component’s contribution. Analyses o
Scalability and deployment realism: The interaction engine relies on SQLite FTS and simple filters. This keeps things lightweight but raises questions about performance on large or sharded corpora, multi-tenant settings, and streaming updates. There is no wall-clock or throughput/latency analysis, nor memory/compute footprint or cost per question. Without these, claims about efficiency (fewer iterations) are not tied to practical runtime advantages. Retrieval quality metrics: The paper focuses
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications
