Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
Ziyu Zhao, Yixiao Zhou, Zhi Zhang, Didi Zhu, Tao Shen, Zexi Li, Jinluan Yang, Xuwu Wang, Jing Su, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng

TL;DR
This paper introduces SMoRA, a novel multi-task learning method that treats each LoRA rank as an expert, enabling better knowledge sharing and task performance with fewer parameters.
Contribution
It unifies single LoRA and multi-LoRA MoE into a single framework and proposes SMoRA, which improves multi-task learning by dynamic rank-wise activation.
Findings
Finer-grained LoRA partitioning improves performance across tasks.
SMoRA activates fewer parameters but achieves better results.
Dynamic rank-wise activation enhances knowledge sharing.
Abstract
Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity. Meanwhile, vanilla LoRA struggles with task conflicts in multi-task scenarios. Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules. While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks. In this paper, we establish a connection between single LoRA and multi-LoRA MoE, integrating them into a unified framework. We demonstrate that the dynamic routing of multiple LoRAs is functionally equivalent to rank partitioning and block-level activation within a single LoRA. We further empirically demonstrate that finer-grained LoRA partitioning, within the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Gaussian Processes and Bayesian Inference · Machine Learning and Data Classification
MethodsADaptive gradient method with the OPTimal convergence rate · Mixture of Experts
