Exploring the Benefits of Training Expert Language Models over Instruction Tuning
Joel Jang, Seungone Kim, Seonghyeon Ye, Doyoung Kim, Lajanugen, Logeswaran, Moontae Lee, Kyungjae Lee, Minjoon Seo

TL;DR
Training a single expert language model on one task can outperform multi-task instruction-tuned models trained on hundreds of tasks, offering benefits like avoiding negative transfer and enabling continual learning.
Contribution
This work demonstrates that expert models trained on individual tasks can surpass multi-task instruction tuning, challenging previous scaling assumptions and enabling modular, continual learning approaches.
Findings
Expert LM on one task outperforms multi-task models on unseen datasets.
Training separate experts avoids negative transfer and catastrophic forgetting.
Experts can be combined for compositional capabilities.
Abstract
Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training tasks is the key component in making stronger MT LMs. In this work, we report an unexpected finding that an expert LM fine-tuned on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by a mean accuracy of 3.20% and 1.29%, respectively. This finding casts doubt on the previously held belief that simply scaling the number of tasks makes stronger MT LMs. Leveraging this finding, we further show that this distributed approach of training a separate expert LM per training task instead of a single MT LM for zero-shot inference possesses many benefits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
