RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang

TL;DR
RAPO introduces a retrieval-augmented framework for agentic RL that enhances exploration and training efficiency in large language model agents by leveraging step-level traces and retrieval-based rewards.
Contribution
The paper proposes RAPO, a novel RL framework that incorporates retrieval to expand exploration and improve training stability in agentic LLM-based agents.
Findings
Achieves +5.0% average gain on fourteen datasets.
Delivers 1.2x faster training efficiency.
Enhances exploration via retrieval-augmented reasoning.
Abstract
Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, tool-integrated reasoning. However, an inherent limitation of existing Agentic RL methods is their reliance on a pure on-policy paradigm for exploration, restricting exploration to the agent's self-generated outputs and preventing the discovery of new reasoning perspectives for further improvement. While recent efforts incorporate auxiliary off-policy signals to enhance exploration, they typically utilize full off-policy trajectories for trajectory-level policy estimation, overlooking the necessity for the fine-grained, step-level exploratory dynamics within agentic rollout. In this paper, we revisit exploration in Agentic RL and propose Retrieval-Augmented Policy Optimization (RAPO), a novel RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics
