$\alpha$-LoRA: Effective Fine-Tuning via Base Model Rescaling

Aymane El Firdoussi; El Mahdi Chayti; Mohamed El Amine Seddik; Martin Jaggi

arXiv:2510.21345·cs.LG·October 27, 2025

$\alpha$-LoRA: Effective Fine-Tuning via Base Model Rescaling

Aymane El Firdoussi, El Mahdi Chayti, Mohamed El Amine Seddik, Martin Jaggi

PDF

3 Reviews

TL;DR

This paper introduces $ ext{ extalpha}$-LoRA, a novel reparameterization method for fine-tuning pre-trained models that improves generalization, supported by theoretical analysis and experiments on large language models.

Contribution

It proposes a new reparameterization approach called $ ext{ extalpha}$-LoRA that enhances fine-tuning effectiveness and generalization in transfer learning.

Findings

01

Theoretical validation using Random Matrix Theory.

02

Improved fine-tuning performance on large language models.

03

Enhanced generalization ability of the proposed method.

Abstract

Fine-tuning has proven to be highly effective in adapting pre-trained models to perform better on new desired tasks with minimal data samples. Among the most widely used approaches are reparameterization methods, which update a target module by augmenting its frozen weight matrix with an additional trainable weight matrix. The most prominent example is Low Rank Adaption (LoRA), which gained significant attention in recent years. In this paper, we introduce a new class of reparameterization methods for transfer learning, designed to enhance the generalization ability of fine-tuned models. We establish the effectiveness of our approach in a high-dimensional binary classification setting using tools from Random Matrix Theory, and further validate our theoretical findings through more realistic experiments, such as fine-tuning LLMs.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

- The RMT analysis is rigorous and provides closed-form expressions for optimal α* in the theoretical setting - The deterministic equivalent framework is well-established and appropriately applied - Proof structure is systematic - The connection between α and task alignment β is intuitive and theoretically justified - Novel theoretical insight: The optimal α* ≠ 1 result challenges the implicit assumption in LoRA that base weights should be preserved at their original scale - Practical algorithm:

Weaknesses

- Limited novelty over prior work: The idea of scaling frozen weights is simple; the main contribution is showing α* ≠ 1 theoretically. However: DoRA already rescales weights (magnitude vs direction decomposition). The row-wise scaling in Eq. 10 is reminiscent of adapter biases. Learning α via separate optimization is similar to meta-learning approaches - Modest empirical gains: Table 2: Most improvements are <2%. No significance tests or confidence intervals provided Three seeds is minimal for

Reviewer 02Rating 2Confidence 3

Strengths

1. This paper raises an interesting point: investigating the scale factor of pretrained weights in LoRA fine-tuning. 2. Several theoretical analyses are provided to support this argument. 3. The paper is quite well-written.

Weaknesses

1. The majority of the theoretical analysis focuses on binary classification problems, which differs from the actual training of large language models (LLMs). 2. The proposed idea is somewhat narrow, specifically concerning the scaling factor of pretrained weights. 3. The experiments are primarily conducted on GLUE; large-scale experiments on large LLMs are therefore needed.

Reviewer 03Rating 4Confidence 3

Strengths

- **(S1)** In the linear GMM setting, the analysis is careful and culminates in a closed-form $\alpha^*$ that depends on data-dependent scalars (via RMT). Figures show how the best α varies with alignment β and dimension, matching intuition and theoretical findings. - **(S2)** Row-wise rescaling of the frozen weights is architecture-agnostic and easy to add to existing LoRA pipelines; parameter overhead is negligible. - **(S3)** The method demonstrates consistent empirical gains: - On Amazon

Weaknesses

- **(W1)** The theoretical model used to analyze the proposed method makes several *strong* assumptions, including **(i)** a *linear* binary classifier trained with **squared-loss ridge regression**, **(ii)** data drawn from **spherical Gaussian mixtures** with identity covariance, **(iii)** a highly constrained source-to-target shift of the form $\mu_\beta=\beta \mu+\mu_\perp$ with a single alignment parameter and an orthogonal residual, and **(iv)** reliance on **high-dimensional asymptotics**

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.