Rethinking Momentum Knowledge Distillation in Online Continual Learning
Nicolas Michel, Maorong Wang, Ling Xiao, Toshihiko Yamasaki

TL;DR
This paper explores the application of Momentum Knowledge Distillation (MKD) in Online Continual Learning (OCL), demonstrating significant accuracy improvements and providing insights into MKD's mechanics within OCL training.
Contribution
It introduces a novel methodology for applying MKD to OCL, significantly enhancing existing methods and analyzing MKD's internal mechanics in this context.
Findings
Improves state-of-the-art accuracy on ImageNet100 by over 10 percentage points.
Provides empirical analysis of MKD's impact during OCL training.
Demonstrates MKD as a central component in OCL methods.
Abstract
Online Continual Learning (OCL) addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence. In contrast to offline Continual Learning, data can be seen only once in OCL, which is a very severe constraint. In this context, replay-based strategies have achieved impressive results and most state-of-the-art approaches heavily depend on them. While Knowledge Distillation (KD) has been extensively used in offline Continual Learning, it remains under-exploited in OCL, despite its high potential. In this paper, we analyze the challenges in applying KD to OCL and give empirical justifications. We introduce a direct yet effective methodology for applying Momentum Knowledge Distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
