Densely Distilling Cumulative Knowledge for Continual Learning

Zenglin Shi; Pei Liu; Tong Su; Yunpeng Wu; Kuien Liu; Yu Song; and; Meng Wang

arXiv:2405.09820·cs.LG·May 17, 2024

Densely Distilling Cumulative Knowledge for Continual Learning

Zenglin Shi, Pei Liu, Tong Su, Yunpeng Wu, Kuien Liu, Yu Song, and, Meng Wang

PDF

Open Access

TL;DR

This paper introduces Dense Knowledge Distillation (DKD), a novel method for continual learning that effectively distills cumulative knowledge across tasks, improving stability and generalization while managing computational costs.

Contribution

The paper proposes DKD, which partitions model outputs into dense groups for comprehensive knowledge distillation, with adaptive weighting and random group selection to enhance continual learning.

Findings

01

DKD outperforms state-of-the-art baselines across benchmarks.

02

It improves model stability and promotes flatter minima.

03

DKD is robust across different memory budgets and task orders.

Abstract

Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track the model's capabilities. It partitions the output logits of the model into dense groups, each corresponding to a task in the task pool. It then distills all tasks' knowledge using all groups. However, using all the groups can be computationally expensive, we also suggest random group selection in each optimization step. Moreover, we propose an adaptive weighting scheme, which balances the learning of new classes and the retention of old classes, based on the count and similarity of the classes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovative Teaching and Learning Methods · Education and Critical Thinking Development · Intelligent Tutoring Systems and Adaptive Learning

MethodsKnowledge Distillation