Vision Transformers that Never Stop Learning
Caihao Sun, Mingqi Yuan, Shiyuan Wang, Jiayu Chen

TL;DR
This paper investigates the loss of plasticity in Vision Transformers, revealing instability in attention modules and proposing ARROW, a geometry-aware optimizer, to better preserve plasticity and improve continual learning performance.
Contribution
It provides a systematic analysis of plasticity loss in ViTs and introduces ARROW, a novel optimizer that adaptively preserves plasticity in attention-based models.
Findings
Attention modules exhibit increasing instability over time.
Parameter re-initialization methods are ineffective in restoring plasticity.
ARROW improves plasticity and continual learning performance.
Abstract
Loss of plasticity refers to the progressive inability of a model to adapt to new tasks and poses a fundamental challenge for continual learning. While this phenomenon has been extensively studied in homogeneous neural architectures, such as multilayer perceptrons, its mechanisms in structurally heterogeneous, attention-based models such as Vision Transformers (ViTs) remain underexplored. In this work, we present a systematic investigation of loss of plasticity in ViTs, including a fine-grained diagnosis using local metrics that capture parameter diversity and utilization. Our analysis reveals that stacked attention modules exhibit increasing instability that exacerbates plasticity loss, while feed-forward network modules suffer even more pronounced degradation. Furthermore, we evaluate several approaches for mitigating plasticity loss. The results indicate that methods based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
