Scalable Multi-Domain Adaptation of Language Models using Modular   Experts

Peter Schafhalter; Shun Liao; Yanqi Zhou; Chih-Kuan Yeh; Arun Kandoor,; James Laudon

arXiv:2410.10181·cs.CL·October 25, 2024

Scalable Multi-Domain Adaptation of Language Models using Modular Experts

Peter Schafhalter, Shun Liao, Yanqi Zhou, Chih-Kuan Yeh, Arun Kandoor,, James Laudon

PDF

Open Access

TL;DR

This paper introduces Modular Domain Experts (MoDE), a scalable mixture-of-experts approach that enhances domain-specific adaptation of language models, balancing performance, knowledge retention, and efficiency.

Contribution

MoDE presents a novel modular architecture with independently trained experts that improves domain adaptation efficiency and performance over existing methods.

Findings

01

MoDE achieves comparable performance to full fine-tuning.

02

MoDE retains 1.65% more general knowledge.

03

Training speeds increase by up to 38%.

Abstract

Domain-specific adaptation is critical to maximizing the performance of pre-trained language models (PLMs) on one or multiple targeted tasks, especially under resource-constrained use cases, such as edge devices. However, existing methods often struggle to balance domain-specific performance, retention of general knowledge, and efficiency for training and inference. To address these challenges, we propose Modular Domain Experts (MoDE). MoDE is a mixture-of-experts architecture that augments a general PLMs with modular, domain-specialized experts. These experts are trained independently and composed together via a lightweight training process. In contrast to standard low-rank adaptation methods, each MoDE expert consists of several transformer layers which scale better with more training examples and larger parameter counts. Our evaluation demonstrates that MoDE achieves comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications