Self-Supervised Training Enhances Online Continual Learning
Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

TL;DR
This paper demonstrates that self-supervised pre-training methods like MoCo-V2, Barlow Twins, and SwAV significantly improve online continual learning performance on ImageNet, especially with limited pre-training data.
Contribution
It shows that self-supervised pre-training outperforms supervised pre-training for online continual learning, providing a new approach to improve model generalization in non-stationary environments.
Findings
Self-supervised pre-training outperforms supervised pre-training on ImageNet.
The performance gains are larger with fewer pre-training samples.
Achieved a 14.95% relative increase in top-1 accuracy over prior state-of-the-art.
Abstract
In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting. Recently, multiple methods have been devised for incrementally learning classes on large-scale image classification tasks, such as ImageNet. State-of-the-art continual learning methods use an initial supervised pre-training phase, in which the first 10% - 50% of the classes in a dataset are used to learn representations in an offline manner before continual learning of new classes begins. We hypothesize that self-supervised pre-training could yield features that generalize better than supervised learning, especially when the number of samples used for pre-training is small. We test this hypothesis using the self-supervised MoCo-V2, Barlow Twins, and SwAV algorithms. On ImageNet, we find that these methods outperform supervised pre-training considerably for online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and Data Classification
MethodsBarlow Twins · LARS · Swapping Assignments between Views
