Flexible and Effective Mixing of Large Language Models into a Mixture of   Domain Experts

Rhui Dih Lee; Laura Wynter; Raghu Kiran Ganti

arXiv:2408.17280·cs.AI·September 12, 2024

Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts

Rhui Dih Lee, Laura Wynter, Raghu Kiran Ganti

PDF

Open Access 8 Models

TL;DR

This paper introduces a toolkit for efficiently creating and configuring mixture-of-domain-experts models from trained models or adapters, facilitating low-cost domain specialization in large language models.

Contribution

It provides a practical toolkit and guidance for building MOE models, enabling flexible and cost-effective domain adaptation of large language models.

Findings

01

Toolkit supports creation of MOE from models or adapters

02

Extensive testing and guidance provided for architecture design

03

Public repository available for community use

Abstract

We present a toolkit for creating low-cost Mixture-of-Domain-Experts (MOE) from trained models. The toolkit can be used for creating a mixture from models or from adapters. We perform extensive tests and offer guidance on defining the architecture of the resulting MOE using the toolkit. A public repository is available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems

MethodsMixture of Experts