Improving Vision Transformers for Incremental Learning
Pei Yu, Yinpeng Chen, Ying Jin, Zicheng Liu

TL;DR
This paper develops a practical method for applying Vision Transformers to class incremental learning, addressing convergence and bias issues, and achieves state-of-the-art results on CIFAR and ImageNet datasets.
Contribution
It introduces ViTIL, a novel recipe that effectively adapts Vision Transformers for incremental learning, overcoming key challenges and setting new performance benchmarks.
Findings
ViTIL outperforms previous methods on CIFAR and ImageNet datasets.
Addressed slow convergence and bias issues in ViT for incremental learning.
Achieved state-of-the-art results across multiple incremental learning setups.
Abstract
This paper proposes a working recipe of using Vision Transformer (ViT) in class incremental learning. Although this recipe only combines existing techniques, developing the combination is not trivial. Firstly, naive application of ViT to replace convolutional neural networks (CNNs) in incremental learning results in serious performance degradation. Secondly, we nail down three issues of naively using ViT: (a) ViT has very slow convergence when the number of classes is small, (b) more bias towards new classes is observed in ViT than CNN-based architectures, and (c) the conventional learning rate of ViT is too low to learn a good classifier layer. Finally, our solution, named ViTIL (ViT for Incremental Learning) achieves new state-of-the-art on both CIFAR and ImageNet datasets for all three class incremental learning setups by a clear margin. We believe this advances the knowledge of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning and ELM
MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Absolute Position Encodings · Byte Pair Encoding
