Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Chenghao Fan; Zhenyi Lu; Sichen Liu; Chengfeng Gu; Xiaoye Qu; Wei Wei; Yu Cheng

arXiv:2502.16894·cs.CL·March 4, 2026

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Chenghao Fan, Zhenyi Lu, Sichen Liu, Chengfeng Gu, Xiaoye Qu, Wei Wei, Yu Cheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces GOAT, a framework that enhances Low-Rank Adaptation (LoRA) for large models by adaptively integrating priors and aligning optimization, significantly improving performance across diverse tasks without changing existing architectures.

Contribution

GOAT proposes an adaptive SVD-based MoE framework and a theoretical scaling factor to improve LoRA's efficiency and performance, bridging the gap with full fine-tuning.

Findings

01

GOAT achieves state-of-the-art results on 25 datasets.

02

Proper scaling boosts LoRA MoE performance without architectural changes.

03

GOAT closes the performance gap with full fine-tuning.

Abstract

While Low-Rank Adaptation (LoRA) enables parameter-efficient fine-tuning for Large Language Models (LLMs), its performance often falls short of Full Fine-Tuning (Full FT). Current methods optimize LoRA by initializing with static singular value decomposition (SVD) subsets, leading to suboptimal leveraging of pre-trained knowledge. Another path for improving LoRA is incorporating a Mixture-of-Experts (MoE) architecture. However, weight misalignment and complex gradient dynamics make it challenging to adopt SVD prior to the LoRA MoE architecture. To mitigate these issues, we propose \underline{G}reat L\underline{o}R\underline{A} Mixture-of-Exper\underline{t} (GOAT), a framework that (1) adaptively integrates relevant priors using an SVD-structured MoE, and (2) aligns optimization with full fine-tuned MoE by deriving a theoretical scaling factor. We demonstrate that proper scaling, without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facico/goat-peft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education

MethodsMixture of Experts · ADaptive gradient method with the OPTimal convergence rate