FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

Annemette Brok Pirchert; Jacob Nielsen; Mogens Henrik From; Lukas Galke Poech; and Peter Schneider-Kamp

arXiv:2602.08818·cs.LG·February 10, 2026

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

Annemette Brok Pirchert, Jacob Nielsen, Mogens Henrik From, Lukas Galke Poech, and Peter Schneider-Kamp

PDF

Open Access

TL;DR

FlexMoRE introduces a flexible mixture of experts with variable ranks, optimizing performance and memory efficiency in federated large language models by tailoring expert size to task complexity.

Contribution

The paper proposes FlexMoRE, a novel mixture-of-experts architecture with rank-heterogeneous experts, demonstrating improved performance and efficiency over full-sized expert models.

Findings

01

Optimal expert rank varies with task type.

02

FlexMoRE outperforms full-sized experts in downstream tasks.

03

Significant memory savings with maintained or improved accuracy.

Abstract

Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating $6$ experts with ranks $2^{0}$ to $2^{14}$ resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across $120$ tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Domain Adaptation and Few-Shot Learning · Expert finding and Q&A systems