AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse, Dodge

TL;DR
AdapterSoup is a simple yet effective method that averages domain-specific adapters in weight space, enhancing the generalization of pretrained language models to new domains without additional training.
Contribution
This paper introduces AdapterSoup, a novel weight averaging technique for adapters trained on different domains, improving domain adaptation efficiency and performance in pretrained language models.
Findings
AdapterSoup consistently improves performance on new domains.
Weight averaging preserves in-domain and out-of-domain performance.
Clustering-based adapter selection yields the best results.
Abstract
Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest · Adapter
