LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

Nurbek Tastan; Stefanos Laskaridis; Martin Takac; Karthik Nandakumar; Samuel Horvath

arXiv:2505.21289·cs.LG·March 10, 2026

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

Nurbek Tastan, Stefanos Laskaridis, Martin Takac, Karthik Nandakumar, Samuel Horvath

PDF

Open Access 3 Reviews

TL;DR

LoFT introduces a low-rank adaptation method that mimics full fine-tuning by aligning optimizer dynamics, achieving better accuracy without extra hyperparameters or increased inference costs.

Contribution

LoFT is a novel low-rank adaptation technique that aligns optimizer moments with full fine-tuning, eliminating hyperparameter tuning and improving performance.

Findings

01

Significantly narrows the performance gap with full fine-tuning.

02

Outperforms standard LoRA methods consistently.

03

Maintains the same inference cost as LoRA.

Abstract

Large pre-trained models are commonly adapted to downstream tasks using parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), which injects small trainable low-rank matrices instead of updating all weights. While LoRA dramatically reduces trainable parameters with little overhead, it can still underperform full fine-tuning in accuracy and often converges more slowly. We introduce LoFT, a novel low-rank adaptation method that behaves like full fine-tuning by aligning the optimizer's internal dynamics with those of updating all model weights. LoFT not only learns weight updates in a low-rank subspace (like LoRA) but also properly projects the optimizer's first and second moments (Adam's momentum and variance) into the same subspace, mirroring full-model updates. By aligning the low-rank update itself with the full update, LoFT eliminates the need for tuning extra…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

The authors provide a principled derivation for a LoRA method meant to explicitly mimic full-finetuning updates. The approach recovers the correct dynamics in the full-rank limit. The authors provide extensive experiments showing the promise of the method especially for low-ranks. The method is practically efficient, it requires modest memory and runtime overhead, and is simple to implement.

Weaknesses

The experiments are only conducted with $r \leq 32$ and models with $\leq 8$B parameters. Second-moment calibration appears to have low-impact at a high cost, however it is still valuable to derive and test this idea. It is unclear if alternation is helpful or not.

Reviewer 02Rating 6Confidence 2

Strengths

**Strong conceptual motivation**: The paper identifies a previously underexplored source of suboptimality in LoRA — optimizer state misalignment — and provides a well-motivated correction grounded in optimization theory. **Methodological completeness**: The framework integrates multiple components (gradient projection, alternating updates, moment calibration) into a cohesive, well-defined optimizer (LoFT-AdamW), which provably reduces to full fine-tuning when rank = full. **Theoretical insight

Weaknesses

**Missing citation and discussion of concurrent work**: The Alternating Updates component (Building Block 1) reproduces an idea conceptually similar to AltLoRA [1], which independently proposed alternating optimization of low-rank factors to eliminate second-order coupling in LoRA updates. The absence of a citation or discussion of AltLoRA is a notable omission, especially since the “alternating update” mechanism is presented as a key innovation. This should be acknowledged as concurrent or par

Reviewer 03Rating 4Confidence 2

Strengths

- **Substantive technical contribution with theory.** The paper proposes a concrete improvement over standard LoRA-style adaptation and backs it up with clear derivations/analysis. The core ideas are technically motivated (e.g., aligning updates with full fine-tuning dynamics), and the method’s components are explained rather than presented as ad-hoc tricks. - **Broad empirical validation across domains.** Experiments cover multiple modalities/datasets (e.g., language and vision) and a range of

Weaknesses

- **Gap between theory and the strongest claim.** While the derivations are compelling, there remains a gap between the formal analysis and the paper’s strongest claim(s) (e.g., exact equivalence to full fine-tuning/AdamW under certain limits). A precise theorem with assumptions, or a more cautious phrasing, would strengthen the work. - **LLM evaluation is too basic.** The large-language-model experiments rely on relatively easy, small benchmarks. For a model like Llama-3-8B, a more representati

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Advanced Adaptive Filtering Techniques