On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Dongyang Fan; Bettina Messmer; Nikita Doikov; Martin Jaggi

arXiv:2409.13931·cs.LG·May 30, 2025

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Dongyang Fan, Bettina Messmer, Nikita Doikov, Martin Jaggi

PDF

Open Access 1 Repo

TL;DR

CoMiGS introduces a novel federated learning approach for on-device large language models, combining generalist and specialist experts via bi-level optimization to adapt to resource and data heterogeneity, enhancing personalization and privacy.

Contribution

It is the first method to address resource and data heterogeneity in on-device federated LLMs using a mixture-of-experts framework with bi-level optimization.

Findings

01

Balances general and personalized knowledge effectively.

02

Remains robust against overfitting due to generalists' regularization.

03

Adapts to local data with specialized experts.

Abstract

On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate private learning with scarce data, Federated Learning has become a standard approach. However, it faces challenges such as computational resource heterogeneity and data heterogeneity among end users. We propose CoMiGS ( $Co$ llaborative learning with a $Mi$ xture of $G$ eneralists and $S$ pecialists), the first approach to address both challenges. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, where the router is optimized using a separate validation set to ensure alignment with the target distribution. We solve our objective with alternating minimization, for which we provide a theoretical analysis. Our method shares generalist experts across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

epfml/comigs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Mixture of Experts