Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

Hao Wang; Guozhi Wang; Han Xiao; Yufeng Zhou; Yue Pan; Jichao Wang; Ke Xu; Yafei Wen; Xiaohu Ruan; Xiaoxin Chen; Honggang Qi

arXiv:2604.10674·cs.LG·April 14, 2026

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

Hao Wang, Guozhi Wang, Han Xiao, Yufeng Zhou, Yue Pan, Jichao Wang, Ke Xu, Yafei Wen, Xiaohu Ruan, Xiaoxin Chen, Honggang Qi

PDF

1 Repo

TL;DR

Skill-SD introduces a dynamic self-distillation framework that transforms an agent's trajectories into natural language skills, enhancing multi-turn LLM agent training by providing adaptive supervision and stabilizing learning.

Contribution

The paper proposes Skill-SD, a novel method that uses agent-generated skills as dynamic privileged information for improved training stability and performance in multi-turn LLM agents.

Findings

01

Skill-SD outperforms standard RL and OPSD baselines on agentic benchmarks.

02

It achieves +14.0%/+10.9% improvements on AppWorld/Sokoban with GRPO.

03

It achieves +42.1%/+40.6% improvements with vanilla OPD.

Abstract

Reinforcement learning (RL) has been widely used to train LLM agents for multi-turn interactive tasks, but its sample efficiency is severely limited by sparse rewards and long horizons. On-policy self-distillation (OPSD) alleviates this by providing dense token-level supervision from a privileged teacher that has access to ground-truth answers. However, such fixed privileged information cannot capture the diverse valid strategies in agent tasks, and naively combining OPSD with RL often leads to training collapse. To address these limitations, we introduce Skill-SD, a framework that turns the agent's own trajectories into dynamic training-only supervision. Completed trajectories are summarized into compact natural language skills that describe successful behaviors, mistakes, and workflows. These skills serve as dynamic privileged information conditioning only the teacher, while the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://k1xe.github.io/skill-sd
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.