Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Sanae Lotfi; Lucas Caccia; Alessandro Sordoni; Jordan T. Ash; Miroslav Dudik

arXiv:2603.03535·cs.LG·March 5, 2026

Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Sanae Lotfi, Lucas Caccia, Alessandro Sordoni, Jordan T. Ash, Miroslav Dudik

PDF

Open Access

TL;DR

This paper empirically compares ensembling, merging, and routing strategies for combining parameter-efficient language models, revealing that non-uniform methods and routing yield significant performance benefits in multi-task learning.

Contribution

It provides a comprehensive analysis of trade-offs among ensembling, merging, and routing techniques, highlighting the advantages of non-uniform approaches and efficient expert selection methods.

Findings

01

Non-uniform ensembling and merging improve performance.

02

Routing offers greater gains than ensembling or merging.

03

Clustering and greedy selection reduce routing overhead.

Abstract

While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies: ensembling, which combines outputs from independent models; merging, which fuses model weights via parameter averaging; and routing, which integrates models in an input-dependent fashion. However, many design decisions in these approaches remain understudied, and the relative benefits of more sophisticated ensembling, merging and routing techniques are not fully understood. We empirically evaluate their trade-offs, addressing two key questions: What are the advantages of going beyond uniform ensembling or merging? And does the flexibility of routing justify its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks