OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning

Jinyuan Feng; Zhiqiang Pu; Tianyi Hu; Dongmin Li; Xiaolin Ai; Huimu Wang

arXiv:2501.10062·cs.LG·July 22, 2025

OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning

Jinyuan Feng, Zhiqiang Pu, Tianyi Hu, Dongmin Li, Xiaolin Ai, Huimu Wang

PDF

Open Access

TL;DR

This paper introduces OMoE, an orthogonal training method for mixture-of-experts models that enhances diversity among experts, leading to improved performance and efficiency in parameter-efficient fine-tuning tasks.

Contribution

The paper proposes OMoE, a novel orthogonal finetuning approach that promotes expert diversity in MoE models without increasing memory or computational costs.

Findings

01

OMoE improves model performance on commonsense reasoning benchmarks.

02

It reduces the number of experts needed for comparable or better results.

03

The method maintains the learning objective while enforcing expert diversity.

Abstract

Building mixture-of-experts (MoE) architecture for Low-rank adaptation (LoRA) is emerging as a potential direction in parameter-efficient fine-tuning (PEFT) for its modular design and remarkable performance. However, simply stacking the number of experts cannot guarantee significant improvement. In this work, we first conduct qualitative analysis to indicate that experts collapse to similar representations in vanilla MoE, limiting the capacity of modular design and computational efficiency. Ulteriorly, Our analysis reveals that the performance of previous MoE variants maybe limited by a lack of diversity among experts. Motivated by these findings, we propose Orthogonal Mixture-of-Experts (OMoE), a resource-efficient MoE variant that trains experts in an orthogonal manner to promote diversity. In OMoE, a Gram-Schmidt process is leveraged to enforce that the experts' representations lie…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques

MethodsMixture of Experts