GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
Ting Bai, Yue Yu, Le Huang, Zenan Xu, Chuan Shi

TL;DR
This paper introduces GMoE, a graph-based MoE framework that improves expert collaboration and stability in LLM fine-tuning, addressing load imbalance issues with novel routing and coordination strategies.
Contribution
GMoE presents a new graph router and coordination strategies for MoE, enhancing expert collaboration and stability during LLM fine-tuning with parameter-efficient methods.
Findings
GMoE outperforms baseline models on multiple benchmarks.
The graph routing improves expert collaboration.
Coordination strategies increase model stability.
Abstract
The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework , aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the and the , to further release the capacity of each expert and increase the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Biomedical Text Mining and Ontologies
MethodsMixture of Experts
