TL;DR
UR$^2$ is a reinforcement learning framework that unifies retrieval-augmented generation and reasoning, enhancing large language models' performance across diverse tasks by dynamically coordinating retrieval and reasoning strategies.
Contribution
The paper introduces UR$^2$, a novel framework that combines retrieval and reasoning with a difficulty-aware curriculum and hybrid knowledge access, improving robustness and generalization.
Findings
UR$^2$ outperforms existing RAG and RL baselines on multiple tasks.
UR$^2$ achieves performance comparable to GPT-4 on several benchmarks.
The code for UR$^2$ is publicly available at the provided GitHub link.
Abstract
Large Language Models (LLMs) have shown strong capabilities through two complementary paradigms: Retrieval-Augmented Generation (RAG) for knowledge grounding and Reinforcement Learning from Verifiable Rewards (RLVR) for complex reasoning. However, existing attempts to unify these paradigms remain narrow in scope, typically limited to open-domain QA with fixed retrieval settings, which constrains generalization to broader domains. To address this limitation, we propose UR (Unified RAG and Reasoning)), a general reinforcement learning framework that dynamically coordinates retrieval and reasoning. UR introduces two key designs: a difficulty-aware curriculum that selectively invokes retrieval only for challenging instances, and a hybrid knowledge access strategy that combines domain-specific offline corpora with on-the-fly LLM-generated summaries. Together, these components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
