Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

Joana Pasquali; Ramiro N. Barros; Arthur S. Bianchessi; Vin\'icius Conte Turani; Jo\~ao Vitor Boer Abitante; Rafaela Cappelari Ravazio; Christian Mattjie; Ot\'avio Parraga; Lucas S. Kupssinsk\"u; Rodrigo C. Barros

arXiv:2605.12752·cs.LG·May 14, 2026

Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning

Joana Pasquali, Ramiro N. Barros, Arthur S. Bianchessi, Vin\'icius Conte Turani, Jo\~ao Vitor Boer Abitante, Rafaela Cappelari Ravazio, Christian Mattjie, Ot\'avio Parraga, Lucas S. Kupssinsk\"u, Rodrigo C. Barros

PDF

TL;DR

This paper introduces SLICE, a gradient-surgery-based initialization method for LoRA adapters that improves continual learning by reducing catastrophic forgetting through gradient reconciliation and SVD decomposition.

Contribution

SLICE is a novel initialization technique for LoRA adapters in continual learning, leveraging gradient reconciliation and SVD to enhance stability and plasticity.

Findings

01

SLICE outperforms vanilla LoRA, LoRA-GA, and LoRAM in stability-plasticity trade-offs.

02

SLICE improves Average Performance, Final Performance, and reduces Forgetting.

03

SLICE maintains General Performance and In Context Performance across various sequences.

Abstract

LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable to catastrophic forgetting, whose severity depends on how successive task gradients interact: when consecutive task gradients conflict, standard adapter initializations channel updates into subspaces that overwrite previously learned directions. We propose SLICE, a gradient-surgery-based initialization for LoRA adapters in continual learning. SLICE accumulates gradients from both the current task and a replay buffer of prior tasks, reconciles them through a projection operator, and decomposes the result via truncated SVD to initialize the adapter weights. We evaluate SLICE on the TRACE benchmark and sequences of Super-NI tasks, including a set of adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.