Experiential Reinforcement Learning
Taiwei Shi, Sihao Chen, Bowen Jiang, Linxin Song, Longqi Yang, Jieyu Zhao

TL;DR
Experiential Reinforcement Learning (ERL) introduces an explicit reflection loop into RL, enabling language models to better learn from sparse, delayed feedback through structured behavioral revision, leading to improved efficiency and performance.
Contribution
ERL is a novel training paradigm that incorporates an experience-reflection-consolidation cycle, enhancing learning stability and effectiveness in sparse-reward environments.
Findings
ERL achieves up to +81% improvement in complex environments.
ERL outperforms strong RL baselines in reasoning tasks.
Structured reflection enhances behavioral learning and stability.
Abstract
Reinforcement learning has become the central approach for language models (LMs) to learn from environmental reward or feedback. In practice, the environmental feedback is usually sparse and delayed. Learning from such signals is challenging, as LMs must implicitly infer how observed failures should translate into behavioral changes for future iterations. We introduce Experiential Reinforcement Learning (ERL), a training paradigm that embeds an explicit experience-reflection-consolidation loop into the reinforcement learning process. Given a task, the model generates an initial attempt, receives environmental feedback, and produces a reflection that guides a refined second attempt, whose success is reinforced and internalized into the base policy. This process converts feedback into structured behavioral revision, improving exploration and stabilizing optimization while preserving gains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
