Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation

Jingming Liu; Yumeng Li; Wei Shi; Yao-Xiang Ding; Hui Su; Kun Zhou

arXiv:2506.18670·cs.IR·June 24, 2025

Harnessing the Power of Reinforcement Learning for Language-Model-Based Information Retriever via Query-Document Co-Augmentation

Jingming Liu, Yumeng Li, Wei Shi, Yao-Xiang Ding, Hui Su, Kun Zhou

PDF

4 Reviews

TL;DR

This paper introduces a reinforcement learning framework for training large language models to augment both queries and documents, significantly improving retrieval performance especially in challenging domains.

Contribution

It presents a novel bidirectional RL approach that jointly optimizes query and document augmentation policies, addressing entangled reward challenges in LLM-based retrieval.

Findings

01

Enhanced retrieval accuracy in sparse and dense settings

02

Significant improvements in difficult retrieval domains

03

Strong generalization across benchmarks

Abstract

Recent studies have proposed leveraging Large Language Models (LLMs) as information retrievers through query rewriting. However, for challenging corpora, we argue that enhancing queries alone is insufficient for robust semantic matching; the LLM should also have sufficient understanding of the corpus by directly handling and augmenting the documents themselves. To this end, we present an LLM-based retriever empowered to augment both user queries and corpus documents, with its policy fully explored via reinforcement learning (RL) and minimal human inductive bias. Notably, we find that simply allowing the LLM to modify documents yields little benefit unless paired with our carefully designed bidirectional RL framework, which enables the LLM to simultaneously learn and collaborate on both query and document augmentation policies. A key technical challenge in realizing such a framework lies…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 3

Strengths

The bidirectional RL formulation gives new findings in IR by aligning query and document semantics through co-augmentation

Weaknesses

1. The work focuses on query–document co-augmentation, yet does not compare against established query expansion or document expansion methods (e.g., Query2Doc). As a result, it remains unclear whether the proposed RL-based co-augmentation offers consistent advantages over existing augmentation paradigms. 2. The paper does not provide the full prompts used for augmentation, despite prompts being central to model behavior. 3. In Table 1, models such as “Qwen2.5-7B” are presented under the headin

Reviewer 02Rating 4Confidence 4

Strengths

• Addresses an under-explored bidirectional augmentation problem for retrieval. • Provides thoughtful engineering to make joint RL training feasible. • Strong empirical gains over static query/document augmentation baselines. • Includes qualitative analysis showing lexical alignment between queries and documents

Weaknesses

• Missing comparison with DeepRetrieval (Jiang et al., 2025), which is explicitly cited but not benchmarked. That work already employs on-policy RL for retrieval optimization and serves as the most relevant baseline. • The improvements (e.g., +0.02–0.04 NDCG@10) are relatively small considering the complexity of the method. • Limited evaluation: only a few BEIR datasets, no real-engine or large-scale web-retrieval experiments. • The reward-sampling estimator lacks theoretical analysis or vari

Reviewer 03Rating 2Confidence 4

Strengths

- The paper developed an RL framework that jointly augments both queries and documents, moving beyond the typical one-sided query rewriting or document expansion approaches in LLM-based retrieval. - The batch–unbatch alternating mechanism demonstrates thoughtful engineering that allows the framework to remain compatible with standard LLM RL pipelines.

Weaknesses

- This method requires augmenting both the query and the document, which poses significant computational efficiency challenges in practical retrieval tasks — especially when performing full-scale document rewriting. - There is no comparison to existing LLM-based retrievers or question rewriting works. - The experimental results show only marginal improvement over the base model, while introducing a much more complex training process. I notice that the performance gap between using LLM augmentat

Reviewer 04Rating 2Confidence 5

Strengths

The paper introduces a well-motivated reinforcement learning framework that jointly optimizes both query and document representations, rather than treating them as separate components. The experiments are well-structured, spanning both sparse (BM25) and dense (BGE-base-en-v1.5) retrieval backbones.

Weaknesses

Overall, the paper’s idea is interesting, but the presentation is vague and lacks clarity, especially in mathematical formulation and experimental description. The following issues should be addressed to improve readability and credibility. **Clarity** 1. The paper does not provide clear mathematical notations when describing the reward function and other key components of the framework. This makes the overall process vague and hard to follow. The authors are encouraged to formally define all i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.