Norm-Bounded Low-Rank Adaptation

Ruigang Wang; Krishnamurthy Dvijotham; Ian R. Manchester

arXiv:2501.19050·cs.LG·September 30, 2025

Norm-Bounded Low-Rank Adaptation

Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester

PDF

Open Access 3 Reviews

TL;DR

This paper introduces NB-LoRA, a novel low-rank adaptation method with explicit norm bounds, enhancing parameter-efficient fine-tuning robustness and performance in language and vision tasks.

Contribution

It proposes a new norm-bounded low-rank adaptation parameterization that guarantees norm constraints and improves robustness over existing methods.

Findings

01

Matches or surpasses performance of existing LoRA methods in language tasks.

02

Enhances robustness to hyper-parameters like rank, learning rate, and epochs.

03

Reduces catastrophic forgetting in vision fine-tuning.

Abstract

In this work, we propose norm-bounded low-rank adaptation (NB-LoRA) for parameter-efficient fine tuning. NB-LoRA is a novel parameterization of low-rank weight adaptations that admits explicit bounds on each singular value of the adaptation matrix, which can thereby satisfy any prescribed unitarily invariant norm bound, including the Schatten norms (e.g., nuclear, Frobenius, spectral norm). The proposed parameterization is unconstrained, smooth, and complete, i.e. it covers all matrices satisfying the prescribed rank and singular-value bounds. Natural language generation experiments show that NB-LoRA matches or surpasses performance of competing LoRA methods, while exhibiting stronger hyper-parameter robustness. Vision fine-tuning experiments show that NB-LoRA can avoid model catastrophic forgetting without minor cost on adaptation performance, and compared to existing approaches it is…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. **Clear motivation.** The paper targets a well-documented limitation of standard LoRA—small initial gradients and sensitivity to learning rate—and proposes a theoretically grounded reparameterization that tightly controls singular values and associated norms. The mapping is smooth and complete, covering all matrices within the prescribed rank and singular-value bounds 2. **Thorough empirical study.** The evaluation spans both LLMs and ViTs, includes comparisons against multiple baselines, abl

Weaknesses

1. **Additional Hyperparameters**: NB-LoRA introduces additional control (e.g., the norm bound $\delta$). Although the paper argues for improved robustness, this still expands the tuning surface in practice. And how to balance the performance and robustness by tuning $\delta$ is also a trade-off. 2. **Extra training time / GPU overhead.** The Cayley reparameterization (and its backward pass) adds measurable overhead relative to vanilla LoRA. But it's minor according to the paper.

Reviewer 02Rating 4Confidence 3

Strengths

- Turning norm-constrained low-rank adaptation into a smooth, complete reparameterization is novel and comes with clear theoretical underpinnings. - The paper presents experiments in both language and vision models suggesting both performance gains and reduced forgetting, accompanied by useful analyses of training dynamics

Weaknesses

- Gradient-dynamics mismatch in analyses/plots: The method trains *free* parameters $\tilde A,\tilde B$ that map to $A,B$ through a Cayley transformation. Under a given learning rate, the *effective* update on $A,B$ is not simply $-\eta\nabla_{A,B}L$; it is mediated by the Jacobian of the reparameterization. Therefore, comparing raw gradient norms (or update magnitudes) across methods in different parameterizations can be misleading. A fair comparison should report the induced per-step update on

Reviewer 03Rating 6Confidence 4

Strengths

**(S1)** The main insight (Thm. 4.2) is original and non-obvious. I find this an elegant solution. **(S2)** Discussion of related work and positioning relative to it is excellent. The main paper in detail discusses similarities and differences to DeLoRA and PiSSA, including detailed experiments on the attainable norm bounds (Fig. 4b) and which matrices can be learned (Fig. 2). **(S3)** The experiments are on point, to me the main evaluation of robust peft methods is not that they surpass the p

Weaknesses

**(W1)** Experiments only consider the high-data regime (MetaMathQA=395K samples, CodeFeedback=66.5K samples). However, to test the robustness of models, it would also be interesting to see if the proposed method makes models more robust to overfitting in low-data regime, for example, through the Dreambooth task used in DeLoRA or similar. **(W2)** Appendices E and F mention that experiments use lr schedulers (cosine for LLMs, one-cycle for ViTs). This is unfortunate because it obscures the effe

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Neural Networks and Reservoir Computing · Advanced Image Processing Techniques