SkillFactory: Self-Distillation For Learning Cognitive Behaviors

Zayne Sprague; Jack Lu; Manya Wadhwa; Sedrick Keh; Mengye Ren; Greg Durrett

arXiv:2512.04072·cs.CL·April 13, 2026

SkillFactory: Self-Distillation For Learning Cognitive Behaviors

Zayne Sprague, Jack Lu, Manya Wadhwa, Sedrick Keh, Mengye Ren, Greg Durrett

PDF

1 Repo 1 Video

TL;DR

SkillFactory introduces a supervised fine-tuning method that uses self-generated samples to prime models for cognitive skills, enhancing their ability to generalize and be robust after reinforcement learning.

Contribution

The paper presents a novel fine-tuning approach that leverages self-sampled data to enable models to acquire cognitive skills before reinforcement learning.

Findings

01

SkillFactory improves model generalization to harder tasks post-RL.

02

Models trained with SkillFactory utilize cognitive skills effectively.

03

SkillFactory models show increased robustness to out-of-domain regressions.

Abstract

Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that aren't exhibited by base models? Our work, SkillFactory, is a method for fine-tuning models to roughly learn these skills during a supervised fine-tuning (SFT) stage prior to RL. Our approach does not rely on distillation from a stronger model, but instead uses samples from the model itself, rearranged to provide training data in the format of those skills. These "silver" SFT traces may be imperfect, but are nevertheless effective for priming a model to acquire skills during RL. Our evaluation shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zayne-sprague/SkillFactory
github

Videos

SkillFactory: Self-Distillation for Learning Cognitive Behaviors· slideslive