TL;DR
RetroAgent introduces a novel reinforcement learning framework for LLM agents that combines extrinsic rewards with retrospective intrinsic feedback, enabling better exploration, experience reuse, and state-of-the-art performance in complex tasks.
Contribution
It proposes a new online RL method with hindsight self-reflection and a retrieval strategy to improve exploration and experience reuse in LLM agents.
Findings
Achieves new SOTA performance on four challenging tasks.
Surpasses baseline methods by significant percentage margins.
Demonstrates strong adaptation and generalization capabilities.
Abstract
Standard reinforcement learning (RL) for large language model (LLM) agents typically optimizes extrinsic rewards, prioritizing isolated task completion over continual adaptation. Consequently, agents often converge to suboptimal policies due to limited exploration. Furthermore, accumulated experience remains implicitly trapped within model parameters, hindering its explicit reuse for guiding future decisions. Inspired by human retrospective self-improvement, we introduce RetroAgent, an online RL framework that trains agents to master complex interactive environments not only by solving tasks, but by evolving under the joint guidance of extrinsic task rewards and retrospective dual intrinsic feedback. Specifically, RetroAgent employs a hindsight self-reflection mechanism that generates two complementary signals: (1) intrinsic numerical feedback, which rewards promising exploration by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
