Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization
Francesco Pelosin, Saurav Jha, Andrea Torsello, Bogdan Raducanu, Joost, van de Weijer

TL;DR
This paper explores exemplar-free continual learning in Vision Transformers, focusing on regularization of self-attention mechanisms, proposing asymmetric distillation, and demonstrating improved stability and plasticity on image recognition benchmarks.
Contribution
It introduces an asymmetric distillation approach for ViTs, enhancing continual learning by balancing stability and plasticity in attention-based regularization.
Findings
Asymmetric POD improves plasticity without losing stability.
Regularization on attention maps enhances global contextual understanding.
ViTs show inherent low forgetting in continual learning scenarios.
Abstract
In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
