Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning
Yaorui Shi, Yuxin Chen, Zhengxi Lu, Yuchun Miao, Shugui Liu, Qi GU, Xunliang Cai, Xiang Wang, An Zhang

TL;DR
Skill1 introduces a unified reinforcement learning framework enabling agents to co-evolve skill selection, utilization, and distillation from a single task-outcome signal, improving performance on complex tasks.
Contribution
It presents a novel single-policy approach that simultaneously trains skill selection, utilization, and distillation, overcoming limitations of isolated optimization methods.
Findings
Skill1 outperforms prior skill-based and RL baselines on ALFWorld and WebShop.
The framework effectively co-evolves three capabilities through shared reward signals.
Ablation studies confirm the importance of integrated credit assignment for skill development.
Abstract
A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize these capabilities in isolation or with separate reward sources, resulting in partial and conflicting evolution. We propose Skill1, a framework that trains a single policy to co-evolve skill selection, utilization, and distillation toward a shared task-outcome objective. The policy generates a query to search the skill library, re-ranks candidates to select one, solves the task conditioned on it, and distills a new skill from the trajectory. All learning derives from a single task-outcome signal. Its low-frequency trend credits selection and its high-frequency variation credits distillation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
