Mitigating Forgetting in Low Rank Adaptation
Joanna Sliwa, Frank Schneider, Philipp Hennig, Jose Miguel Hernandez-Lobato

TL;DR
This paper introduces LaLoRA, a weight-space regularization method using Laplace approximation to mitigate catastrophic forgetting in low-rank adaptation of large models, improving knowledge retention during fine-tuning.
Contribution
LaLoRA is a novel regularization technique that applies Laplace approximation to LoRA weights, effectively balancing learning new tasks and preserving prior knowledge.
Findings
LaLoRA improves the learning-forgetting trade-off in fine-tuning large models.
The method allows direct control over forgetting through regularization strength.
It demonstrates robustness across different hyperparameters and data choices.
Abstract
Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), enable fast specialization of large pre-trained models to different downstream applications. However, this process often leads to catastrophic forgetting of the model's prior domain knowledge. We address this issue with LaLoRA, a weight-space regularization technique that applies a Laplace approximation to Low-Rank Adaptation. Our approach estimates the model's confidence in each parameter and constrains updates in high-curvature directions, preserving prior knowledge while enabling efficient target-domain learning. By applying the Laplace approximation only to the LoRA weights, the method remains lightweight. We evaluate LaLoRA by fine-tuning a Llama model for mathematical reasoning and demonstrate an improved learning-forgetting trade-off, which can be directly controlled via the method's regularization…
Peer Reviews
Decision·Submitted to ICLR 2026
1. This paper is clearly written and well-motivated. The topic of combatting catastrophic forgetting is important. 2. The paper proposes an efficient approach to calculating the curvature information, specifically via Fisher information. 3. The experiments demonstrate that LaLoRA is effective in combatting forgetting.
1. The proposed method requires (a subset of) source data, which is typically unavailable for task-specific fine-tuning. 2. The significance of the proposed approach is questionable. In Figure 2(a), I find that the learning performance saturates around 2 epochs with very little forgetting. Thus, vanilla LoRA with early stopping is sufficient. 3. More baseline methods are needed, especially mentioned Bar, Flat-LoRA, etc. [1] Implicit Regularization of Sharpness-Aware Minimization for Scale-Invar
- The idea is conceptually sound, combining Laplace-based uncertainty estimation with LoRA. - The paper draws a clear connection to EWC-style continual learning while adapting it to PEFT.
- The method assumes the availability of source or surrogate data to estimate curvature, which is unrealistic for most LLM fine-tuning scenarios. The proposed "minimal proxy batches" solution only partially addresses this. - No analysis of computational efficiency relative to vanilla LoRA is given; specifically, incorporating the cost from Stage I. - It is unclear how much source-domain data is required for LaLoRA to perform well, or if the regularization is robust when limited data are availa
1. **Clear Motivation**: The application of the Laplace approximation specifically to LoRA adapters is well-justified, namely finding those less important weights in pretraining. Moreover, it is demonstrated to be efficient, and avoids the prohibitive computational cost of full-parameter curvature modeling. The two-stage algorithm is clearly described and mathematically well-formulated, e.g., in Equations (1)-(5), with careful distinction between diagonal and structured Kronecker-Factored (K-FAC
1. **Unclear Theoretical Guarantees and Some Ambiguous Symbolism**: The Laplace-regularized loss in Equation (5) and associated regularizer expression could be made clearer, with more rigorous notation for how $\overline{\Sigma}$ is estimated, especially in multi-dataset settings. Although the practical motivation for restricting the Laplace approximation to LoRA weights is strong, a more explicit analysis of the cases where this is justified (i.e., under what assumptions low-rank space alone su
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
