Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

Wenda Wei; Yu-An Liu; Ruqing Zhang; Jiafeng Guo; Lixin Su; Shuaiqiang Wang; Dawei Yin; Maarten de Rijke; Xueqi Cheng

arXiv:2511.09109·cs.CL·December 29, 2025

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning

Wenda Wei, Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Lixin Su, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Xueqi Cheng

PDF

Open Access 1 Video

TL;DR

This paper introduces Bi-RAR, a multi-objective reinforcement learning framework for retrieval-augmented reasoning that evaluates intermediate steps bidirectionally, improving complex multi-step question answering performance.

Contribution

It proposes a novel bidirectional evaluation method and a multi-objective RL approach to enhance reasoning accuracy in retrieval-augmented models.

Findings

01

Outperforms previous methods on seven QA benchmarks.

02

Effectively integrates search engine interaction during training.

03

Improves reasoning quality with bidirectional step evaluation.

Abstract

Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models, yet its effectiveness remains limited in complex, multi-step reasoning scenarios. Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval. Most approaches rely on outcome-based supervision, offering no explicit guidance for intermediate steps. This often leads to reward hacking and degraded response quality. We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions. To assess the information completeness of each step, we introduce a bidirectional information distance grounded in Kolmogorov complexity, approximated via language model generation probabilities. This quantification measures both how far the current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques