A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models

Mengyang Sun; Yihao Wang; Tao Feng; Dan Zhang; Yifan Zhu; Jie Tang

arXiv:2502.15828·cs.LG·February 25, 2025

A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models

Mengyang Sun, Yihao Wang, Tao Feng, Dan Zhang, Yifan Zhu, Jie Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel training strategy for Mixture-of-Low-Rank-Adapters (MoE-LoRA) that enhances robustness and feature learning in fine-tuning foundation models, leveraging Riemannian preconditioners for stability.

Contribution

It proposes a Riemannian preconditioning-based training method for MoE-LoRA, improving robustness and effectiveness in fine-tuning large models.

Findings

01

Enhanced stability during training and inference.

02

Improved performance across downstream tasks.

03

Effective with SGD and AdamW optimizers.

Abstract

In order to streamline the fine-tuning of foundation models, Low-Rank Adapters (LoRAs) have been substantially adopted across various fields, including instruction tuning and domain adaptation. The underlying concept of LoRA involves decomposing a full-rank matrix into the product of two lower-rank matrices, which reduces storage consumption and accelerates the training process. Furthermore, to address the limited expressive capacity of LoRA, the Mixture-of-Expert (MoE) has been introduced for incorporating multiple LoRA adapters. The integration of LoRA experts leads to a visible improvement across several downstream scenes. However, the mixture of LoRAs (MoE-LoRA) still exhibits its low robustness during tuning and inferring. Inspired by the Riemannian Preconditioners which train LoRA as a sub-space projector, we propose a new training strategy for MoE-LoRA, to stabilize and boost its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thudm/moelora_riemannian
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDam Engineering and Safety

MethodsStochastic Gradient Descent · AdamW