Continual Distillation of Teachers from Different Domains

Nicolas Michel; Maorong Wang; Jiangpeng He; Toshihiko Yamasaki

arXiv:2605.04059·cs.LG·May 7, 2026

Continual Distillation of Teachers from Different Domains

Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki

PDF

1 Repo

TL;DR

This paper introduces Continual Distillation, a method for sequentially learning from multiple teacher models across different domains without access to previous teachers, addressing challenges of knowledge transfer and forgetting.

Contribution

It proposes SE2D, a novel approach that stabilizes learning from heterogeneous teachers using external data, improving cross-domain generalization and reducing knowledge forgetting.

Findings

01

SE2D effectively reduces Unseen Knowledge Forgetting.

02

External unlabeled data enables transfer from unseen domains.

03

SE2D improves performance across multiple benchmarks.

Abstract

Deep learning models continue to scale, with some requiring more storage than many large-scale datasets. Thus, we introduce a new paradigm: Continual Distillation (CD), where a student learns sequentially from a stream of teacher models without retaining access to earlier teachers. CD faces two challenges: teacher training data is unavailable, and teachers have varying expertise. We show that external unlabeled data enables Unseen Knowledge Transfer (UKT), allowing the student to acquire information from domains not present in the training data, while known to the teacher. We also show that sequential distillation causes Unseen Knowledge Forgetting (UKF) when transferred knowledge is lost after training on later teachers. To better trade off between UKT and UKF, we propose Self External Data Distillation (SE2D), a method that preserves logits on external data to stabilize learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Nicolas1203/continual_distillation
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.