R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Limin Liu

TL;DR
R-Search introduces a reinforcement learning framework that enhances large language models' reasoning by optimizing their interaction with search, leading to significant improvements in complex, knowledge-intensive tasks.
Contribution
It presents a novel multi-reward RL approach for dynamic reasoning-search integration, enabling LLMs to better decide when to retrieve information or reason, improving response quality.
Findings
Outperforms RAG baselines by up to 32.2% in-domain
Achieves 25.1% improvement out-of-domain
Effectively learns optimal reasoning-search trajectories
Abstract
Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer · Dropout · Dense Connections · Attention Is All You Need
