Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory
Huiyan Xue, Xuming Ran, Yaxin Li, Qi Xu, Enhui Li, Yi Xu, Qiang Zhang

TL;DR
This paper introduces Selective Subnetwork Distillation (SSD), a novel method for continual learning with sparse neural networks that enhances knowledge transfer and retention without replay or task labels.
Contribution
SSD is a structurally guided distillation framework that aligns subnetworks across tasks, improving continual learning in sparse neural architectures.
Findings
SSD improves accuracy on Split CIFAR-10, CIFAR-100, and MNIST.
SSD enhances knowledge retention and representation coverage.
SSD operates without replay or explicit task labels.
Abstract
Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
