Training-Free Dynamic Upcycling of Expert Language Models

Eros Fan\`i; O\u{g}uzhan Ersoy

arXiv:2603.29765·cs.LG·April 1, 2026

Training-Free Dynamic Upcycling of Expert Language Models

Eros Fan\`i, O\u{g}uzhan Ersoy

PDF

1 Repo

TL;DR

DUME is a training-free, scalable method that constructs a multi-domain expert language model by dynamically reusing dense experts through a closed-form ridge regression solution, avoiding additional training.

Contribution

It introduces DUME, a novel approach to build multi-domain language models without extra training, outperforming baselines and enabling dynamic expert addition.

Findings

01

DUME retains up to 97.6% of a dense expert model's performance in a specific domain.

02

DUME surpasses dense experts in reasoning tasks, achieving 102.1% performance.

03

The method is cost-efficient, scalable, and can be fine-tuned for further improvements.

Abstract

Large Language Models (LLMs) have achieved remarkable performance on a wide range of specialized tasks, exhibiting strong problem-solving capabilities. However, training these models is prohibitively expensive, and they often lack domain-specific expertise because they rely on general knowledge datasets. Expertise finetuning can address this issue; however, it often leads to overspecialization, and developing a single multi-domain expert remains difficult due to diverging objectives. Furthermore, multitask training is challenging due to interference and catastrophic forgetting. Existing work proposes combining the expertise of dense models within a Mixture of Experts (MoE) architecture, although this approach still requires multitask finetuning. To address these issues, we introduce Dynamic Upcycling MoE (DUME), a novel approach that reuses dense experts trained on different domains to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gensyn-ai/dume
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.