Low-Rank Adapters Initialization via Gradient Surgery for Continual Learning
Joana Pasquali, Ramiro N. Barros, Arthur S. Bianchessi, Vin\'icius Conte Turani, Jo\~ao Vitor Boer Abitante, Rafaela Cappelari Ravazio, Christian Mattjie, Ot\'avio Parraga, Lucas S. Kupssinsk\"u, Rodrigo C. Barros

TL;DR
This paper introduces SLICE, a gradient-surgery-based initialization method for LoRA adapters that improves continual learning by reducing catastrophic forgetting through gradient reconciliation and SVD decomposition.
Contribution
SLICE is a novel initialization technique for LoRA adapters in continual learning, leveraging gradient reconciliation and SVD to enhance stability and plasticity.
Findings
SLICE outperforms vanilla LoRA, LoRA-GA, and LoRAM in stability-plasticity trade-offs.
SLICE improves Average Performance, Final Performance, and reduces Forgetting.
SLICE maintains General Performance and In Context Performance across various sequences.
Abstract
LoRA is widely adopted for continual fine-tuning of Large Language Models due to its parameter efficiency, modularity across tasks, and compatibility with replay strategies. However, LoRA-based continual learning remains vulnerable to catastrophic forgetting, whose severity depends on how successive task gradients interact: when consecutive task gradients conflict, standard adapter initializations channel updates into subspaces that overwrite previously learned directions. We propose SLICE, a gradient-surgery-based initialization for LoRA adapters in continual learning. SLICE accumulates gradients from both the current task and a replay buffer of prior tasks, reconciles them through a projection operator, and decomposes the result via truncated SVD to initialize the adapter weights. We evaluate SLICE on the TRACE benchmark and sequences of Super-NI tasks, including a set of adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
