Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Artyom Sorokin; Nazar Buzun; Alexander Anokhin; Oleg Inozemcev; Egor Vedernikov; Petr Anokhin; Mikhail Burtsev; Trushkov Alexey; Yin Wenshuai; Evgeny Burnaev

arXiv:2511.07328·cs.LG·May 5, 2026

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Artyom Sorokin, Nazar Buzun, Alexander Anokhin, Oleg Inozemcev, Egor Vedernikov, Petr Anokhin, Mikhail Burtsev, Trushkov Alexey, Yin Wenshuai, Evgeny Burnaev

PDF

1 Repo 3 Models 1 Video

TL;DR

Q-RAG introduces a reinforcement learning-based fine-tuning method for multi-step retrieval, enabling efficient long-context question answering with state-of-the-art results on large benchmarks.

Contribution

It presents a novel, resource-efficient approach to multi-step retrieval by training an Embedder with reinforcement learning, outperforming existing methods.

Findings

01

Achieves state-of-the-art results on BabiLong and RULER benchmarks.

02

Supports contexts up to 10 million tokens.

03

Offers a resource-efficient alternative to fine-tuning small LLMs.

Abstract

Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

griver/Q-RAG
github

Models

Videos

Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training· slideslive