DEMix Layers: Disentangling Domains for Modular Language Modeling

Suchin Gururangan; Mike Lewis; Ari Holtzman; Noah A. Smith; Luke; Zettlemoyer

arXiv:2108.05036·cs.CL·August 24, 2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke, Zettlemoyer

PDF

Open Access 2 Repos

TL;DR

This paper presents DEMix layers, a modular approach for domain-specific conditioning in language models, improving adaptability, efficiency, and generalization across multiple domains without retraining.

Contribution

Introduction of DEMix layers, enabling modular, domain-specific expert networks in language models that can be added, removed, or mixed dynamically without retraining.

Findings

01

Reduces test perplexity across domains

02

Enhances training efficiency for large LMs

03

Allows rapid domain adaptation and control

Abstract

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity, increase training efficiency, and enable rapid adaptation with little overhead. We show that mixing experts during inference, using a parameter-free weighted ensemble, allows the model to better generalize to heterogeneous or unseen domains. We also show that experts can be added to iteratively incorporate new domains without forgetting older ones, and that experts can be removed to restrict access to unwanted domains, without additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis