Densely Distilling Cumulative Knowledge for Continual Learning
Zenglin Shi, Pei Liu, Tong Su, Yunpeng Wu, Kuien Liu, Yu Song, and, Meng Wang

TL;DR
This paper introduces Dense Knowledge Distillation (DKD), a novel method for continual learning that effectively distills cumulative knowledge across tasks, improving stability and generalization while managing computational costs.
Contribution
The paper proposes DKD, which partitions model outputs into dense groups for comprehensive knowledge distillation, with adaptive weighting and random group selection to enhance continual learning.
Findings
DKD outperforms state-of-the-art baselines across benchmarks.
It improves model stability and promotes flatter minima.
DKD is robust across different memory budgets and task orders.
Abstract
Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track the model's capabilities. It partitions the output logits of the model into dense groups, each corresponding to a task in the task pool. It then distills all tasks' knowledge using all groups. However, using all the groups can be computationally expensive, we also suggest random group selection in each optimization step. Moreover, we propose an adaptive weighting scheme, which balances the learning of new classes and the retention of old classes, based on the count and similarity of the classes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Education and Critical Thinking Development · Intelligent Tutoring Systems and Adaptive Learning
MethodsKnowledge Distillation
