Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Huiyan Xue; Xuming Ran; Yaxin Li; Qi Xu; Enhui Li; Yi Xu; Qiang Zhang

arXiv:2512.15267·cs.LG·December 18, 2025

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Huiyan Xue, Xuming Ran, Yaxin Li, Qi Xu, Enhui Li, Yi Xu, Qiang Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces Selective Subnetwork Distillation (SSD), a novel method for continual learning with sparse neural networks that enhances knowledge transfer and retention without replay or task labels.

Contribution

SSD is a structurally guided distillation framework that aligns subnetworks across tasks, improving continual learning in sparse neural architectures.

Findings

01

SSD improves accuracy on Split CIFAR-10, CIFAR-100, and MNIST.

02

SSD enhances knowledge retention and representation coverage.

03

SSD operates without replay or explicit task labels.

Abstract

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications