Self-Distillation Enables Continual Learning
Idan Shenfeld, Mehul Damani, Jonas H\"ubotter, Pulkit Agrawal

TL;DR
This paper introduces Self-Distillation Fine-Tuning (SDFT), a novel on-policy learning method from demonstrations that improves continual learning by reducing forgetting and enhancing skill acquisition in models.
Contribution
The paper proposes SDFT, a simple yet effective on-policy distillation technique that enables continual learning from demonstrations, outperforming supervised fine-tuning in retaining and acquiring skills.
Findings
SDFT outperforms supervised fine-tuning in skill learning tasks.
SDFT significantly reduces catastrophic forgetting in sequential learning.
SDFT enables models to accumulate multiple skills over time without performance loss.
Abstract
Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy. We introduce Self-Distillation Fine-Tuning (SDFT), a simple method that enables on-policy learning directly from demonstrations. SDFT leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills. Across skill learning and knowledge acquisition tasks, SDFT consistently outperforms SFT, achieving higher new-task accuracy while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yasserrmd/LFM2.5-1.2B-onpolicymodel· 3 dl3 dl
- 🤗yasserrmd/lfm2.5-1.5b-sdftmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗aolans/Qwen2.5-7B-Instruct-SDFT-fp16model· 16 dl16 dl
- 🤗aolans/Qwen2.5-7B-Instruct-SDFT-2ep-fp16model· 11 dl11 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-100model· 13 dl13 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-200model· 12 dl12 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-300model· 13 dl13 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-400model· 5 dl5 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-500model· 12 dl12 dl
- 🤗ayushnangia-sdft/qwen2.5-7b-instruct-sdft-tooluse-step-600model· 12 dl12 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning
