RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Siwei Zhang; Yun Xiong; Xi Chen; Zi'an Jia; Renhong Huang; Jiarong Xu; Jiawei Zhang

arXiv:2603.03078·cs.AI·March 4, 2026

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Siwei Zhang, Yun Xiong, Xi Chen, Zi'an Jia, Renhong Huang, Jiarong Xu, Jiawei Zhang

PDF

Open Access

TL;DR

RAPO introduces a retrieval-augmented framework for agentic RL that enhances exploration and training efficiency in large language model agents by leveraging step-level traces and retrieval-based rewards.

Contribution

The paper proposes RAPO, a novel RL framework that incorporates retrieval to expand exploration and improve training stability in agentic LLM-based agents.

Findings

01

Achieves +5.0% average gain on fourteen datasets.

02

Delivers 1.2x faster training efficiency.

03

Enhances exploration via retrieval-augmented reasoning.

Abstract

Agentic Reinforcement Learning (Agentic RL) has shown remarkable potential in large language model-based (LLM) agents. These works can empower LLM agents to tackle complex tasks via multi-step, tool-integrated reasoning. However, an inherent limitation of existing Agentic RL methods is their reliance on a pure on-policy paradigm for exploration, restricting exploration to the agent's self-generated outputs and preventing the discovery of new reasoning perspectives for further improvement. While recent efforts incorporate auxiliary off-policy signals to enhance exploration, they typically utilize full off-policy trajectories for trajectory-level policy estimation, overlooking the necessity for the fine-grained, step-level exploratory dynamics within agentic rollout. In this paper, we revisit exploration in Agentic RL and propose Retrieval-Augmented Policy Optimization (RAPO), a novel RL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics