On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning
Muhammad Ahmad, Jingjing Zheng, Yankai Cao

TL;DR
This paper investigates how low-rank decomposition methods for parameter-efficient fine-tuning affect catastrophic forgetting, revealing that subspace geometry and parameterization significantly influence continual learning performance.
Contribution
It provides an empirical analysis of low-rank PEFT methods, highlighting the importance of update subspace design and proposing strategies to mitigate forgetting.
Findings
Tensor-based decompositions reduce forgetting by capturing richer structural information.
Shared matrix subspaces often lead to task interference and increased forgetting.
Structurally aligned parameterizations help preserve pretrained representations.
Abstract
Parameter-efficient fine-tuning (PEFT) based on low-rank decomposition, such as LoRA, has become a standard for adapting large pretrained models. However, its behavior in sequential learning -- specifically regarding catastrophic forgetting -- remains insufficiently understood. In this work, we present an empirical study showing that forgetting is strongly influenced by the geometry and parameterization of the update subspace. While methods that restrict updates to small, shared matrix subspaces often suffer from task interference, tensor-based decompositions (e.g., LoRETTA) mitigate forgetting by capturing richer structural information within ultra-compact budgets, and structurally aligned parameterizations (e.g., WeGeFT) preserve pretrained representations. Our findings highlight update subspace design as a key factor in continual learning and offer practical guidance for selecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
