KeepLoRA: Continual Learning with Residual Gradient Adaptation

Mao-Lin Luo; Zi-Hao Zhou; Yi-Lin Zhang; Yuanyu Wan; Tong Wei; Min-Ling Zhang

arXiv:2601.19659·cs.CV·January 28, 2026

KeepLoRA: Continual Learning with Residual Gradient Adaptation

Mao-Lin Luo, Zi-Hao Zhou, Yi-Lin Zhang, Yuanyu Wan, Tong Wei, Min-Ling Zhang

PDF

Open Access 3 Reviews

TL;DR

KeepLoRA is a continual learning method that preserves pre-trained knowledge and adapts to new tasks by restricting updates in the residual subspace, achieving state-of-the-art results.

Contribution

It introduces a novel residual gradient adaptation approach that balances knowledge retention and plasticity in continual learning for vision-language models.

Findings

01

Keeps state-of-the-art performance in continual learning tasks

02

Effectively balances knowledge retention and plasticity

03

Theoretical analysis supports empirical results

Abstract

Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents a simple but effective approach called KeepLoRA to effectively balance these objectives. We first analyze the knowledge retention mechanism within the model parameter space and find that general knowledge is mainly encoded in the principal subspace, while task-specific knowledge is encoded in the residual subspace. Motivated by this finding, KeepLoRA learns new tasks by restricting LoRA parameter updates in the residual subspace to prevent interfering with previously learned capabilities. Specifically, we infuse knowledge for a new task by projecting its gradient onto a subspace orthogonal to both the principal subspace…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. Clear idea of separating general (principal) and specific (residual) knowledge in the parameter space. 2. Provides strong theoretical justification connecting the method to optimal, constrained gradient descent. 3. Good practicality as it adds no inference overhead, unlike architecture-extension methods.

Weaknesses

1. Unclear if the unified principal subspace, which accumulates past task directions, can scale to a large number of tasks without prohibitive cost. 2. The method requires expensive per-task full-gradient computation and a one-time full-model SVD, which are not fully benchmarked. 3. Relies on crucial hyperparameters (e.g., $\epsilon_w$, $\epsilon_f$) whose robustness and sensitivity are not deeply analyzed.

Reviewer 02Rating 6Confidence 4

Strengths

1. KeepLoRa works with pre-trained models addressing three competing objectives: maintaining the ability to learn new knowledge (plasticity), preventing the forgetting of previously learned tasks (backward stability), and preserving the general pre-trained knowledge that guarantees general transferability (forward stability). 2. KeepLora can be implemented in a relatively straightforward manner. 3. Sensible theoretical analysis and strong empirical performance.

Weaknesses

1. KeepLora+ outperforms KeepLora but the first mention of KeepLora+ is in Table 2 and $4.1. 2. KeepLora requires storing dominant singular vectors from tasks M but this is not listed in Table 2 nor analysed elsewhere. 3. KeepLora introduces epsilon_w, epsilon_f, r, alpha hyper parameters but only the sensitivity of epsilon_w(vision) and epsilon_w(text) is presented.

Reviewer 03Rating 4Confidence 5

Strengths

1. State of the Art performance: The paper demonstrates that KeepLoRA and its variant, KeepLoRA+, achieve state-of-the-art results on the MTIL benchmark, outperforming previous methods on all key metrics 2. Strong Empirical Analysis: The paper's core hypothesis is based on a clear, intuitive analysis of the model's parameter space. The finding that general knowledge resides in the principal subspace while task-specific knowledge is in the residual subspace provides a solid foundation for the me

Weaknesses

1. Limited Evaluation on Language: The paper focuses on Vision-Language Models (VLMs), but the evaluation is performed on a benchmark of 11 image classification datasets. The "language" aspect is only used for zero-shot classification via class names. The method's effectiveness on more complex, language-heavy VLM tasks is unproven. 2. Your method, by design, projects new task gradients to be orthogonal to the principal subspace and all previous task directions to ensure stability. Does this str

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications