Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Pierre Ablin; Angelos Katharopoulos; Skyler Seto; David Grangier

arXiv:2502.01804·cs.LG·February 5, 2025

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David Grangier

PDF

Open Access

TL;DR

The paper introduces Soup-of-Experts, a flexible model architecture that efficiently creates specialized models for different data domains at test time by linearly combining expert parameters, without retraining.

Contribution

It presents a novel architecture that enables rapid instantiation of domain-specific models through parameter averaging, reducing computational costs and training time.

Findings

01

Effective in creating small specialized models for language tasks

02

Allows quick adaptation to multiple domains without retraining

03

Maintains performance across diverse data domains

Abstract

Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Expert finding and Q&A systems