TL;DR
This paper introduces PSOFT, a novel parameter-efficient fine-tuning method that confines orthogonal transformations to the principal subspace of pre-trained models, balancing semantic preservation and adaptability across NLP and CV tasks.
Contribution
PSOFT constructs a principal subspace for orthogonal transformations, providing a scalable, theoretically grounded approach that enhances expressiveness and efficiency in PEFT.
Findings
Outperforms existing PEFT methods on 35 NLP and CV tasks.
Maintains semantic integrity while improving model adaptability.
Achieves a balance between efficiency and expressiveness.
Abstract
Driven by the rapid growth of model parameters, parameter-efficient fine-tuning (PEFT) has become essential for adapting large models to diverse downstream tasks under constrained computational resources. Within this paradigm, orthogonal fine-tuning and its variants preserve semantic representations of pre-trained models, but struggle to achieve both expressiveness and efficiency in terms of parameter counts, memory, and computation. To overcome this limitation, we propose efficient Orthogonal Fine-Tuning with Principal Subspace adaptation (PSOFT), which confines orthogonal transformations to the principal subspace of pre-trained weights. Specifically, PSOFT constructs this subspace via matrix decomposition to enable compatible transformations with higher effective rank, establishes a theoretical condition that strictly maintains the geometry of this subspace for essential semantic…
Peer Reviews
Decision·ICLR 2026 Poster
The idea is conceptually elegant and well-motivated. Combining SVD-based low-rank decomposition with a learnable rotation matrix is a natural way to balance efficiency and expressiveness. The orthogonal projection helps preserve the representational capacity of the subspace, while the rotation matrix allows fine-grained adaptation to downstream tasks. The paper is clearly written and easy to follow. The method is presented in a straightforward manner with comprehensive experimental validation a
1. The related work section lacks a clear comparison and differentiation between PSOFT and closely related methods such as **DoRA** and **LoRA-XS**. A more detailed discussion highlighting conceptual and empirical differences would strengthen the contribution. 2. The reported ranks $r$ (e.g., 46, 354, 424) are irregular and seem to vary considerably. This suggests possible **sensitivity to hyperparameter tuning**, which should be verified through an ablation study on $r$. 3. It appears that al
* The core idea of applying orthogonal transformations within a low-rank principal subspace is an intuitive and effective way to bridge the gap between LoRA and OFT. * The comprehensive evaluation across 35 NLP and CV tasks shows strong performance. PSOFT is not only accurate but also highly parameter- and memory-efficient, crucially avoiding the out-of-memory (OOM) errors that plague other OFT variants. * The paper successfully highlights that efficiency is multi-dimensional. PSOFT shows clear
1. **Theory vs. Practice:** The paper emphasizes "strict" geometry preservation as a key benefit, yet the best-performing algorithm intentionally relaxes this condition with tunable vectors to improve results. Could you discuss the trade-off here and the impact of this relaxation on the semantic preservation you aim for? 2. **"Effective Rank" Definition:** The claim of a "higher effective rank" is based on a non-standard definition (`r_PSOFT = √M`). This is confusing and weakens the claim of h
1. This paper rethinks the updates of fine-tuning within the principal space of the original weight matrix. 2. The method demonstrates strong results, consistently outperforming LoRA, PiSSA, and other OFT variants (BOFT, GOFT) on a wide range of benchmarks, including GLUE, VTAB-1K, GSM-8K, and commonsense reasoning. 3. PSOFT achieves its strong performance while using significantly fewer parameters than competitors. In Table 2, PSOFT and GOFT (0.08M) achieves the best average performance while
1. A misclaim of the core idea of the proposed algorithm: The key idea of "orthogonal fine-tuning" is keeping correlations between neurons (say inner products) unchanged after fine-tuning, thus providing semantic-preservation. However, this does not hold for PSOFT. In fact, PSOFT is actually an additive fine-tuning method similar to LoRA, $\Delta W = W_{final} - W_{pre} = (A'RB' + W_{res}) - (A'B' + W_{res}) = A'(R-I)B'$, where the updates are restricted to principal subspace and enjoy a natural
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
