TL;DR
This paper introduces a reinforcement learning framework for training large language models to augment both queries and documents, significantly improving retrieval performance especially in challenging domains.
Contribution
It presents a novel bidirectional RL approach that jointly optimizes query and document augmentation policies, addressing entangled reward challenges in LLM-based retrieval.
Findings
Enhanced retrieval accuracy in sparse and dense settings
Significant improvements in difficult retrieval domains
Strong generalization across benchmarks
Abstract
Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching; the LLM should also have sufficient understanding of the corpus by directly handling and augmenting the documents themselves. To this end, we present an LLM-based retriever empowered to augment both user queries and corpus documents, with its policy fully explored via reinforcement learning (RL) and minimal human inductive bias. Notably, we find that simply allowing the LLM to modify documents yields little benefit unless paired with our carefully designed bidirectional RL framework, which enables the LLM to simultaneously learn and collaborate on both query and document augmentation policies. A key technical challenge in realizing such a framework lies…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The bidirectional RL formulation gives new findings in IR by aligning query and document semantics through co-augmentation
1. The work focuses on query–document co-augmentation, yet does not compare against established query expansion or document expansion methods (e.g., Query2Doc). As a result, it remains unclear whether the proposed RL-based co-augmentation offers consistent advantages over existing augmentation paradigms. 2. The paper does not provide the full prompts used for augmentation, despite prompts being central to model behavior. 3. In Table 1, models such as “Qwen2.5-7B” are presented under the headin
• Addresses an under-explored bidirectional augmentation problem for retrieval. • Provides thoughtful engineering to make joint RL training feasible. • Strong empirical gains over static query/document augmentation baselines. • Includes qualitative analysis showing lexical alignment between queries and documents
• Missing comparison with DeepRetrieval (Jiang et al., 2025), which is explicitly cited but not benchmarked. That work already employs on-policy RL for retrieval optimization and serves as the most relevant baseline. • The improvements (e.g., +0.02–0.04 NDCG@10) are relatively small considering the complexity of the method. • Limited evaluation: only a few BEIR datasets, no real-engine or large-scale web-retrieval experiments. • The reward-sampling estimator lacks theoretical analysis or vari
- The paper developed an RL framework that jointly augments both queries and documents, moving beyond the typical one-sided query rewriting or document expansion approaches in LLM-based retrieval. - The batch–unbatch alternating mechanism demonstrates thoughtful engineering that allows the framework to remain compatible with standard LLM RL pipelines.
- This method requires augmenting both the query and the document, which poses significant computational efficiency challenges in practical retrieval tasks — especially when performing full-scale document rewriting. - There is no comparison to existing LLM-based retrievers or question rewriting works. - The experimental results show only marginal improvement over the base model, while introducing a much more complex training process. I notice that the performance gap between using LLM augmentat
The paper introduces a well-motivated reinforcement learning framework that jointly optimizes both query and document representations, rather than treating them as separate components. The experiments are well-structured, spanning both sparse (BM25) and dense (BGE-base-en-v1.5) retrieval backbones.
Overall, the paper’s idea is interesting, but the presentation is vague and lacks clarity, especially in mathematical formulation and experimental description. The following issues should be addressed to improve readability and credibility. **Clarity** 1. The paper does not provide clear mathematical notations when describing the reward function and other key components of the framework. This makes the overall process vague and hard to follow. The authors are encouraged to formally define all i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
