Towards Exemplar-Free Continual Learning in Vision Transformers: an   Account of Attention, Functional and Weight Regularization

Francesco Pelosin; Saurav Jha; Andrea Torsello; Bogdan Raducanu; Joost; van de Weijer

arXiv:2203.13167·cs.CV·May 6, 2022·1 cites

Towards Exemplar-Free Continual Learning in Vision Transformers: an Account of Attention, Functional and Weight Regularization

Francesco Pelosin, Saurav Jha, Andrea Torsello, Bogdan Raducanu, Joost, van de Weijer

PDF

Open Access 1 Repo

TL;DR

This paper explores exemplar-free continual learning in Vision Transformers, focusing on regularization of self-attention mechanisms, proposing asymmetric distillation, and demonstrating improved stability and plasticity on image recognition benchmarks.

Contribution

It introduces an asymmetric distillation approach for ViTs, enhancing continual learning by balancing stability and plasticity in attention-based regularization.

Findings

01

Asymmetric POD improves plasticity without losing stability.

02

Regularization on attention maps enhances global contextual understanding.

03

ViTs show inherent low forgetting in continual learning scenarios.

Abstract

In this paper, we investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism (SAM). Our work takes an initial step towards a surgical investigation of SAM for designing coherent continual learning methods in ViTs. We first carry out an evaluation of established continual learning regularization techniques. We then examine the effect of regularization when applied to two key enablers of SAM: (a) the contextualized embedding layers, for their ability to capture well-scaled representations with respect to the values, and (b) the prescaled attention maps, for carrying value-independent global contextual information. We depict the perks of each distilling strategy on two image recognition benchmarks (CIFAR100 and ImageNet-32) -- while (a)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

srvCodes/continual_learning_with_vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications