Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation

Fei Wu; Jia Hu; Geyong Min; Shiqiang Wang

arXiv:2505.11235·cs.LG·February 20, 2026

Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation

Fei Wu, Jia Hu, Geyong Min, Shiqiang Wang

PDF

3 Reviews

TL;DR

This paper introduces PSOFT, a novel parameter-efficient fine-tuning method that confines orthogonal transformations to the principal subspace of pre-trained models, balancing semantic preservation and adaptability across NLP and CV tasks.

Contribution

PSOFT constructs a principal subspace for orthogonal transformations, providing a scalable, theoretically grounded approach that enhances expressiveness and efficiency in PEFT.

Findings

01

Outperforms existing PEFT methods on 35 NLP and CV tasks.

02

Maintains semantic integrity while improving model adaptability.

03

Achieves a balance between efficiency and expressiveness.

Abstract

Driven by the rapid growth of model parameters, parameter-efficient fine-tuning (PEFT) has become essential for adapting large models to diverse downstream tasks under constrained computational resources. Within this paradigm, orthogonal fine-tuning and its variants preserve semantic representations of pre-trained models, but struggle to achieve both expressiveness and efficiency in terms of parameter counts, memory, and computation. To overcome this limitation, we propose efficient Orthogonal Fine-Tuning with Principal Subspace adaptation (PSOFT), which confines orthogonal transformations to the principal subspace of pre-trained weights. Specifically, PSOFT constructs this subspace via matrix decomposition to enable compatible transformations with higher effective rank, establishes a theoretical condition that strictly maintains the geometry of this subspace for essential semantic…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The idea is conceptually elegant and well-motivated. Combining SVD-based low-rank decomposition with a learnable rotation matrix is a natural way to balance efficiency and expressiveness. The orthogonal projection helps preserve the representational capacity of the subspace, while the rotation matrix allows fine-grained adaptation to downstream tasks. The paper is clearly written and easy to follow. The method is presented in a straightforward manner with comprehensive experimental validation a

Weaknesses

1. The related work section lacks a clear comparison and differentiation between PSOFT and closely related methods such as **DoRA** and **LoRA-XS**. A more detailed discussion highlighting conceptual and empirical differences would strengthen the contribution. 2. The reported ranks $r$ (e.g., 46, 354, 424) are irregular and seem to vary considerably. This suggests possible **sensitivity to hyperparameter tuning**, which should be verified through an ablation study on $r$. 3. It appears that al

Reviewer 02Rating 6Confidence 3

Strengths

* The core idea of applying orthogonal transformations within a low-rank principal subspace is an intuitive and effective way to bridge the gap between LoRA and OFT. * The comprehensive evaluation across 35 NLP and CV tasks shows strong performance. PSOFT is not only accurate but also highly parameter- and memory-efficient, crucially avoiding the out-of-memory (OOM) errors that plague other OFT variants. * The paper successfully highlights that efficiency is multi-dimensional. PSOFT shows clear

Weaknesses

1. **Theory vs. Practice:** The paper emphasizes "strict" geometry preservation as a key benefit, yet the best-performing algorithm intentionally relaxes this condition with tunable vectors to improve results. Could you discuss the trade-off here and the impact of this relaxation on the semantic preservation you aim for? 2. **"Effective Rank" Definition:** The claim of a "higher effective rank" is based on a non-standard definition (`r_PSOFT = √M`). This is confusing and weakens the claim of h

Reviewer 03Rating 2Confidence 4

Strengths

1. This paper rethinks the updates of fine-tuning within the principal space of the original weight matrix. 2. The method demonstrates strong results, consistently outperforming LoRA, PiSSA, and other OFT variants (BOFT, GOFT) on a wide range of benchmarks, including GLUE, VTAB-1K, GSM-8K, and commonsense reasoning. 3. PSOFT achieves its strong performance while using significantly fewer parameters than competitors. In Table 2, PSOFT and GOFT (0.08M) achieves the best average performance while

Weaknesses

1. A misclaim of the core idea of the proposed algorithm: The key idea of "orthogonal fine-tuning" is keeping correlations between neurons (say inner products) unchanged after fine-tuning, thus providing semantic-preservation. However, this does not hold for PSOFT. In fact, PSOFT is actually an additive fine-tuning method similar to LoRA, $\Delta W = W_{final} - W_{pre} = (A'RB' + W_{res}) - (A'B' + W_{res}) = A'(R-I)B'$, where the updates are restricted to principal subspace and enjoy a natural

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.