KeepLoRA: Continual Learning with Residual Gradient Adaptation
Mao-Lin Luo, Zi-Hao Zhou, Yi-Lin Zhang, Yuanyu Wan, Tong Wei, Min-Ling Zhang

TL;DR
KeepLoRA is a continual learning method that preserves pre-trained knowledge and adapts to new tasks by restricting updates in the residual subspace, achieving state-of-the-art results.
Contribution
It introduces a novel residual gradient adaptation approach that balances knowledge retention and plasticity in continual learning for vision-language models.
Findings
Keeps state-of-the-art performance in continual learning tasks
Effectively balances knowledge retention and plasticity
Theoretical analysis supports empirical results
Abstract
Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives. We first analyze the knowledge retention mechanism within the model parameter space and find that general knowledge is mainly encoded in the principal subspace, while task-specific knowledge is encoded in the residual subspace. Motivated by this finding, KeepLoRA learns new tasks by restricting LoRA parameter updates in the residual subspace to prevent interfering with previously learned capabilities. Specifically, we infuse knowledge for a new task by projecting its gradient onto a subspace orthogonal to both the principal subspace…
Peer Reviews
Decision·ICLR 2026 Poster
1. Clear idea of separating general (principal) and specific (residual) knowledge in the parameter space. 2. Provides strong theoretical justification connecting the method to optimal, constrained gradient descent. 3. Good practicality as it adds no inference overhead, unlike architecture-extension methods.
1. Unclear if the unified principal subspace, which accumulates past task directions, can scale to a large number of tasks without prohibitive cost. 2. The method requires expensive per-task full-gradient computation and a one-time full-model SVD, which are not fully benchmarked. 3. Relies on crucial hyperparameters (e.g., $\epsilon_w$, $\epsilon_f$) whose robustness and sensitivity are not deeply analyzed.
1. KeepLoRa works with pre-trained models addressing three competing objectives: maintaining the ability to learn new knowledge (plasticity), preventing the forgetting of previously learned tasks (backward stability), and preserving the general pre-trained knowledge that guarantees general transferability (forward stability). 2. KeepLora can be implemented in a relatively straightforward manner. 3. Sensible theoretical analysis and strong empirical performance.
1. KeepLora+ outperforms KeepLora but the first mention of KeepLora+ is in Table 2 and $4.1. 2. KeepLora requires storing dominant singular vectors from tasks M but this is not listed in Table 2 nor analysed elsewhere. 3. KeepLora introduces epsilon_w, epsilon_f, r, alpha hyper parameters but only the sensitivity of epsilon_w(vision) and epsilon_w(text) is presented.
1. State of the Art performance: The paper demonstrates that KeepLoRA and its variant, KeepLoRA+, achieve state-of-the-art results on the MTIL benchmark, outperforming previous methods on all key metrics 2. Strong Empirical Analysis: The paper's core hypothesis is based on a clear, intuitive analysis of the model's parameter space. The finding that general knowledge resides in the principal subspace while task-specific knowledge is in the residual subspace provides a solid foundation for the me
1. Limited Evaluation on Language: The paper focuses on Vision-Language Models (VLMs), but the evaluation is performed on a benchmark of 11 image classification datasets. The "language" aspect is only used for zero-shot classification via class names. The method's effectiveness on more complex, language-heavy VLM tasks is unproven. 2. Your method, by design, projects new task gradients to be orthogonal to the principal subspace and all previous task directions to ensure stability. Does this str
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
