Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

Yicheng Zhang; Zhen Qin; Zhaomin Wu; Wenqi Zhang; Shuiguang Deng

arXiv:2602.03645·cs.LG·February 4, 2026

Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG

Yicheng Zhang, Zhen Qin, Zhaomin Wu, Wenqi Zhang, Shuiguang Deng

PDF

Open Access

TL;DR

This paper introduces reinforcement learning-based fine-tuning for history-aware dense retrievers in RAG, addressing objective mismatch issues and improving retrieval performance through stochastic sampling and history incorporation.

Contribution

It proposes a novel RL-based retriever optimization method that incorporates retrieval history and addresses deterministic retrieval limitations in RAG.

Findings

01

Consistent performance improvements across diverse RAG pipelines.

02

Effective mitigation of state aliasing through retrieval history.

03

Enhanced alignment between retriever training and RAG objectives.

Abstract

Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retriever optimization has emerged as an efficient alternative to fine-tuning LLMs. However, existing solutions suffer from objective mismatch between retriever optimization and the goal of RAG pipeline. Reinforcement learning (RL) provides a promising solution to address this limitation, yet applying RL to retriever optimization introduces two fundamental challenges: 1) the deterministic retrieval is incompatible with RL formulations, and 2) state aliasing arises from query-only retrieval in multi-hop reasoning. To address these challenges, we replace deterministic retrieval with stochastic sampling and formulate RAG as a Markov decision process, making retriever optimizable by RL. Further, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior