Dynamic Dual-Granularity Skill Bank for Agentic RL

Songjun Tu; Chengdong Xu; Qichao Zhang; Yaocheng Zhang; Xiangyuan Lan; Linjing Li; Dongbin Zhao

arXiv:2603.28716·cs.AI·March 31, 2026

Dynamic Dual-Granularity Skill Bank for Agentic RL

Songjun Tu, Chengdong Xu, Qichao Zhang, Yaocheng Zhang, Xiangyuan Lan, Linjing Li, Dongbin Zhao

PDF

TL;DR

This paper introduces D2Skill, a dynamic dual-granularity skill bank for agentic RL that enhances reusable experience management, leading to significant improvements in success rates across multiple environments.

Contribution

It proposes a novel skill bank architecture with dual granularity and dynamic maintenance, enabling more effective reuse of experience in agentic RL.

Findings

01

D2Skill improves success rates by 10-20 points over baselines.

02

Both dual-granularity modeling and dynamic maintenance are essential for performance gains.

03

Learned skills transfer across different evaluation settings and add modest training overhead.

Abstract

Agentic reinforcement learning (RL) can benefit substantially from reusable experience, yet existing skill-based methods mainly extract trajectory-level guidance and often lack principled mechanisms for maintaining an evolving skill memory. We propose D2Skill, a dynamic dual-granularity skill bank for agentic RL that organizes reusable experience into task skills for high-level guidance and step skills for fine-grained decision support and error correction. D2Skill jointly trains the policy and skill bank through paired baseline and skill-injected rollouts under the same policy, using their performance gap to derive hindsight utility signals for both skill updating and policy optimization. Built entirely from training-time experience, the skill bank is continuously expanded through reflection and maintained with utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.