Meta-attention for ViT-backed Continual Learning
Mengqi Xue, Haofei Zhang, Jie Song, Mingli Song

TL;DR
This paper introduces MEAT, a novel attention-based method for adapting pre-trained vision transformers to continual learning tasks, achieving higher accuracy and efficiency than CNN-based approaches.
Contribution
The paper proposes MEAT, a new mask-based continual learning method for ViTs that masks only a portion of parameters, improving efficiency and accuracy over prior CNN-based methods.
Findings
MEAT outperforms CNN-based methods with 4-6% higher accuracy.
MEAT is more efficient with less parameter overhead.
Extensive experiments validate MEAT's superiority in continual learning for ViTs.
Abstract
Continual learning is a longstanding research topic due to its crucial role in tackling continually arriving tasks. Up to now, the study of continual learning in computer vision is mainly restricted to convolutional neural networks (CNNs). However, recently there is a tendency that the newly emerging vision transformers (ViTs) are gradually dominating the field of computer vision, which leaves CNN-based continual learning lagging behind as they can suffer from severe performance degradation if straightforwardly applied to ViTs. In this paper, we study ViT-backed continual learning to strive for higher performance riding on recent advances of ViTs. Inspired by mask-based continual learning methods in CNNs, where a mask is learned per task to adapt the pre-trained ViT to the new task, we propose MEta-ATtention (MEAT), i.e., attention to self-attention, to adapt a pre-trained ViT to new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
