Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Fei Bai; Zhipeng Chen; Chuan Hao; Ming Yang; Ran Tao; Bryan Dai; Wayne Xin Zhao; Jian Yang; Hongteng Xu

arXiv:2603.24093·cs.LG·March 26, 2026

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongteng Xu

PDF

Open Access

TL;DR

This paper introduces DGO, a framework that enhances reinforcement learning from verifiable rewards in large language models by effectively utilizing and internalizing external and internal experiences, leading to improved reasoning capabilities.

Contribution

DGO is a novel unified framework that constructs an experience bank and guides exploration, improving RLVR training by mimicking human-like internalization of experience.

Findings

01

DGO outperforms baseline methods in reasoning tasks.

02

Utilizing experience banks improves exploration efficiency.

03

Internalization of experience enhances model stability.

Abstract

Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external and internal experience to guide exploration and gradually internalize useful trajectories into stable knowledge. Motivated by this gap, we ask: how can LLMs better utilize and internalize experience during RLVR training? To answer this question, we propose \textbf{D}ual \textbf{G}uidance \textbf{O}ptimization~(\textbf{DGO}), a unified framework that leverages \emph{external} and \emph{internal experience} to improve training effectiveness. Specifically, DGO first constructs an experience bank from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications