Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

Wenjian Zhang; Kongcheng Zhang; Jiaxin Qi; Baisheng Lai; Jianqiang Huang

arXiv:2603.20046·cs.AI·March 23, 2026

Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs

Wenjian Zhang, Kongcheng Zhang, Jiaxin Qi, Baisheng Lai, Jianqiang Huang

PDF

Open Access

TL;DR

HeRL is a novel reinforcement learning framework that uses hindsight experience and bonus rewards to improve exploration and learning efficiency in training large language models, leading to better reasoning capabilities.

Contribution

The paper introduces HeRL, a new RL method that explicitly guides exploration using hindsight experiences and reward bonuses, enhancing LLM training beyond traditional approaches.

Findings

01

HeRL outperforms baseline methods on multiple benchmarks.

02

Incorporating hindsight experience improves exploration efficiency.

03

Experience-guided self-improvement further boosts performance.

Abstract

Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Language Models (LLMs), yet still suffers from ineffective exploration confined to curent policy distribution. In fact, RL optimization can be viewed as steering the policy toward an ideal distribution that maximizes the rewards, while effective exploration should align efforts with desired target. Leveraging this insight, we propose HeRL, a Hindsight experience guided Reinforcement Learning framework to bootstrap effective exploration by explicitly telling LLMs the desired behaviors specified in rewards. Concretely, HeRL treats failed trajectories along with their unmet rubrics as hindsight experience, which serves as in-context guidance for the policy to explore desired responses beyond its current distribution. Additionally, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications