TL;DR
This paper introduces $ ext{ extalpha}$-LoRA, a novel reparameterization method for fine-tuning pre-trained models that improves generalization, supported by theoretical analysis and experiments on large language models.
Contribution
It proposes a new reparameterization approach called $ ext{ extalpha}$-LoRA that enhances fine-tuning effectiveness and generalization in transfer learning.
Findings
Theoretical validation using Random Matrix Theory.
Improved fine-tuning performance on large language models.
Enhanced generalization ability of the proposed method.
Abstract
Fine-tuning has proven to be highly effective in adapting pre-trained models to perform better on new desired tasks with minimal data samples. Among the most widely used approaches are reparameterization methods, which update a target module by augmenting its frozen weight matrix with an additional trainable weight matrix. The most prominent example is Low Rank Adaption (LoRA), which gained significant attention in recent years. In this paper, we introduce a new class of reparameterization methods for transfer learning, designed to enhance the generalization ability of fine-tuned models. We establish the effectiveness of our approach in a high-dimensional binary classification setting using tools from Random Matrix Theory, and further validate our theoretical findings through more realistic experiments, such as fine-tuning LLMs.
Peer Reviews
Decision·Submitted to ICLR 2026
- The RMT analysis is rigorous and provides closed-form expressions for optimal α* in the theoretical setting - The deterministic equivalent framework is well-established and appropriately applied - Proof structure is systematic - The connection between α and task alignment β is intuitive and theoretically justified - Novel theoretical insight: The optimal α* ≠ 1 result challenges the implicit assumption in LoRA that base weights should be preserved at their original scale - Practical algorithm:
- Limited novelty over prior work: The idea of scaling frozen weights is simple; the main contribution is showing α* ≠ 1 theoretically. However: DoRA already rescales weights (magnitude vs direction decomposition). The row-wise scaling in Eq. 10 is reminiscent of adapter biases. Learning α via separate optimization is similar to meta-learning approaches - Modest empirical gains: Table 2: Most improvements are <2%. No significance tests or confidence intervals provided Three seeds is minimal for
1. This paper raises an interesting point: investigating the scale factor of pretrained weights in LoRA fine-tuning. 2. Several theoretical analyses are provided to support this argument. 3. The paper is quite well-written.
1. The majority of the theoretical analysis focuses on binary classification problems, which differs from the actual training of large language models (LLMs). 2. The proposed idea is somewhat narrow, specifically concerning the scaling factor of pretrained weights. 3. The experiments are primarily conducted on GLUE; large-scale experiments on large LLMs are therefore needed.
- **(S1)** In the linear GMM setting, the analysis is careful and culminates in a closed-form $\alpha^*$ that depends on data-dependent scalars (via RMT). Figures show how the best α varies with alignment β and dimension, matching intuition and theoretical findings. - **(S2)** Row-wise rescaling of the frozen weights is architecture-agnostic and easy to add to existing LoRA pipelines; parameter overhead is negligible. - **(S3)** The method demonstrates consistent empirical gains: - On Amazon
- **(W1)** The theoretical model used to analyze the proposed method makes several *strong* assumptions, including **(i)** a *linear* binary classifier trained with **squared-loss ridge regression**, **(ii)** data drawn from **spherical Gaussian mixtures** with identity covariance, **(iii)** a highly constrained source-to-target shift of the form $\mu_\beta=\beta \mu+\mu_\perp$ with a single alignment parameter and an orthogonal residual, and **(iv)** reliance on **high-dimensional asymptotics**
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
