GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors

Hengyuan Zhang; Xinrong Chen; Yingmin Qiu; Xiao Liang; Ziyue Li; Guanyu Wang; Weiping Li; Tong Mo; Hayden Kwok-Hay So; Ngai Wong

arXiv:2506.14646·cs.CL·September 23, 2025

GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors

Hengyuan Zhang, Xinrong Chen, Yingmin Qiu, Xiao Liang, Ziyue Li, Guanyu Wang, Weiping Li, Tong Mo, Hayden Kwok-Hay So, Ngai Wong

PDF

Open Access 1 Repo

TL;DR

GuiLoMo introduces a bilevel optimization-based method for adaptive layer-wise allocation of expert numbers and ranks in LoRA-MoE models, improving performance and diversity in parameter-efficient fine-tuning of large language models.

Contribution

It proposes GuiLoMo, a novel approach using GuidedSelection Vectors and bilevel optimization for fine-grained expert configuration in LoRA-MoE, addressing limitations of uniform expert assignment.

Findings

01

Consistently outperforms baseline methods across multiple benchmarks.

02

Adaptive expert configuration varies meaningfully across layers and tasks.

03

Provides insights into the relationship between expert allocation and model performance.

Abstract

Parameter-efficient fine-tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), offer an efficient way to adapt large language models with reduced computational costs. However, their performance is limited by the small number of trainable parameters. Recent work combines LoRA with the Mixture-of-Experts (MoE), i.e., LoRA-MoE, to enhance capacity, but two limitations remain in hindering the full exploitation of its potential: 1) the influence of downstream tasks when assigning expert numbers, and 2) the uniform rank assignment across all LoRA experts, which restricts representational diversity. To mitigate these gaps, we propose GuiLoMo, a fine-grained layer-wise expert numbers and ranks allocation strategy with GuidedSelection Vectors (GSVs). GSVs are learned via a prior bilevel optimization process to capture both model- and task-specific needs, and are then used to allocate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liar406/gui-lomo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Reservoir Engineering and Simulation Methods