Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs
Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang

TL;DR
HeRL is a novel reinforcement learning framework that uses hindsight experience and bonus rewards to improve exploration and learning efficiency in training large language models, leading to better reasoning capabilities.
Contribution
The paper introduces HeRL, a new RL method that explicitly guides exploration using hindsight experiences and reward bonuses, enhancing LLM training beyond traditional approaches.
Findings
HeRL outperforms baseline methods on multiple benchmarks.
Incorporating hindsight experience improves exploration efficiency.
Experience-guided self-improvement further boosts performance.
Abstract
Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from ineffective exploration confined to curent policy distribution. In fact, RL optimization can be viewed as steering the policy toward an ideal distribution that maximizes the rewards, while effective exploration should align efforts with desired target. Leveraging this insight, we propose HeRL, a Hindsight experience guided Reinforcement Learning framework to bootstrap effective exploration by explicitly telling LLMs the desired behaviors specified in rewards. Concretely, HeRL treats failed trajectories along with their unmet rubrics as hindsight experience, which serves as in-context guidance for the policy to explore desired responses beyond its current distribution. Additionally, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
