SplitLoRA: Balancing Stability and Plasticity in Continual Learning Through Gradient Space Splitting
Haomiao Qiu, Miao Zhang, Ziyue Qiao, Weili Guan, Min Zhang, Liqiang Nie

TL;DR
SplitLoRA introduces a novel gradient space partitioning method based on Low-Rank Adaptation to improve the balance of stability and plasticity in continual learning, achieving state-of-the-art results.
Contribution
It provides a theoretical analysis and a new method for optimal gradient space partitioning in continual learning using Low-Rank Adaptation.
Findings
Achieves state-of-the-art performance on multiple datasets.
Effectively balances stability and plasticity in continual learning.
Outperforms existing gradient projection methods.
Abstract
Continual Learning requires a model to learn multiple tasks in sequence while maintaining both stability:preserving knowledge from previously learned tasks, and plasticity:effectively learning new tasks. Gradient projection has emerged as an effective and popular paradigm in CL, where it partitions the gradient space of previously learned tasks into two orthogonal subspaces: a primary subspace and a minor subspace. New tasks are learned effectively within the minor subspace, thereby reducing interference with previously acquired knowledge. However, existing Gradient Projection methods struggle to achieve an optimal balance between plasticity and stability, as it is hard to appropriately partition the gradient space. In this work, we consider a continual learning paradigm based on Low-Rank Adaptation, which has gained considerable attention due to its efficiency and wide applicability,…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper establishes a theoretical foundation for the proposed method by analyzing gradient projection and minor subspace, and by formulating a corresponding optimization problem. 2. SplitLoRA attains state-of-the-art performance across three benchmarks under different task configurations.
1. A major concern is the absence of evaluation metrics. In continual learning, it is critical to assess the stability-plasticity trade-off using both average accuracy (like FAA and CAA reported by authors) and backward transfer (BWT). The omission of BWT weakens the credibility of the empirical results and raises doubts about the robustness of the performance evaluation. 2. Another issue is the limited selection of baselines in experimental comparison. As mentioned in Section 2.2, Gradient Proj
This paper utilizes LoRA in the continual learning framework and uses the gradient to update the process. An upper bound analysis on the loss increase is provided. Although I have not checked the entire proof, based on previous results, Equation 7 is more or less correct. The SplitLoRA algorithm is also presented.”
The overall paper is not well written. The main motivation for balancing plasticity and stability is addressed through the use of LoRA. However, LoRA may reduce information, and it is unclear why this method has this effect. Please provide more explanation to clarify the motivation and core contribution of the approach. The modified method uses LoRA to replace the corresponding update, which is more akin to a report. Regarding the dataset and model, they are outdated and not suitable for the c
The paper well articulated its core idea of splitting orthogonally the gradient space into two complementary subspace via SVD, one for previously learned task and the other for the new task. The motivation is simple, but the authors provided a theoretical analysis on the impact of subspace partitioning on model stability and plasticity for CL. The proposed SplitLoRA methods demonstrated consistent good performance against the SOTA.
It is arguable that the orthogonal projection might not be the ultimate solution for the catastrophic forgetting, although the proposed SplitLoRA could still match some of our current development of CL. Capacity of the proposed SplitLoRA in terms of the number of well learned tasks is not sufficiently discussed. It seems that the partitioning of the gradient space into previously learned tasks and the new task doesn't impact on the number of tasks to be learned.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Advanced Neural Network Applications
MethodsSoftmax · Attention Is All You Need
