BTS: Harmonizing Specialized Experts into a Generalist LLM
Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob, Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan,, Suchin Gururangan, Mike Lewis

TL;DR
BTS introduces a modular training algorithm that efficiently combines specialized LLM experts into a versatile generalist model using lightweight stitch layers, preserving expert capabilities and enabling flexible domain integration.
Contribution
The paper presents BTS, a novel method for merging independently trained experts into a generalist LLM with minimal retraining and high flexibility.
Findings
BTS achieves superior performance on downstream tasks compared to other merging methods.
The approach maintains the specialized skills of individual experts.
Experts can be added or removed with minimal additional training.
Abstract
We present Branch-Train-Stitch (BTS), an efficient and flexible training algorithm for combining independently trained large language model (LLM) experts into a single, capable generalist model. Following Li et al., we start with a single seed language model which is branched into domain-specific (e.g., coding or math) experts with continual pretraining. BTS combines experts into a generalist model using lightweight stitch layers, which are inserted between frozen experts and the seed LLM, and trained on a small datamix of the expert domains. Stitch layers enable the seed LLM to integrate representations from any number of experts during the forward pass, allowing it to generalize to new domains, despite remaining frozen. Because BTS does not alter the constituent LLMs, BTS provides a modular and flexible approach: experts can be easily removed and new experts can be added with only a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
