Skill-R1: Agent Skill Evolution via Reinforcement Learning

Yash Vishe; Rohan Surana; Xunyi Jiang; Zihan Huang; Xintong Li; Nikki Lijing Kuang; Tong Yu; Ryan A. Rossi; Jingbo Shang; Julian McAuley; Junda Wu

arXiv:2605.09359·cs.LG·May 12, 2026

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Yash Vishe, Rohan Surana, Xunyi Jiang, Zihan Huang, Xintong Li, Nikki Lijing Kuang, Tong Yu, Ryan A. Rossi, Jingbo Shang, Julian McAuley, Junda Wu

PDF

TL;DR

Skill-R1 is a reinforcement learning framework that enables efficient, instance-level skill evolution in language models by training a lightweight skill generator, improving multi-step task performance without updating the core model.

Contribution

It introduces a bi-level policy optimization method for recurrent skill refinement, compatible with black-box models, and demonstrates improved performance on complex tasks.

Findings

01

Skill-R1 outperforms no-skill baselines and standard GRPO.

02

It achieves significant gains on complex, multi-step tasks.

03

The framework enables cost-effective skill adaptation without model fine-tuning.

Abstract

Agentic large language models often rely on skills, reusable natural language procedures that guide planning, action, and tool use. In practice, skills are typically improved through prompt engineering or by aligning the task LLM itself, which is costly, model-specific, and often infeasible for closed-source models. Skill optimization is not a one-step problem but a recurrent process with two coupled levels of credit assignment: a useful skill must improve rollout quality under current conditioning, while a useful revision must turn observed outcomes into a better skill for the next round. We propose Skill-R1, a reinforcement learning framework for instance-level recurrent skill optimization from verifiable rewards. Rather than updating the task LLM, Skill-R1 trains a lightweight skill generator that conditions on the task context, prior rollouts, and their verified outcomes to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.