Exploring the Benefits of Training Expert Language Models over   Instruction Tuning

Joel Jang; Seungone Kim; Seonghyeon Ye; Doyoung Kim; Lajanugen; Logeswaran; Moontae Lee; Kyungjae Lee; Minjoon Seo

arXiv:2302.03202·cs.CL·February 10, 2023·20 cites

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

Joel Jang, Seungone Kim, Seonghyeon Ye, Doyoung Kim, Lajanugen, Logeswaran, Moontae Lee, Kyungjae Lee, Minjoon Seo

PDF

Open Access 2 Repos

TL;DR

Training a single expert language model on one task can outperform multi-task instruction-tuned models trained on hundreds of tasks, offering benefits like avoiding negative transfer and enabling continual learning.

Contribution

This work demonstrates that expert models trained on individual tasks can surpass multi-task instruction tuning, challenging previous scaling assumptions and enabling modular, continual learning approaches.

Findings

01

Expert LM on one task outperforms multi-task models on unseen datasets.

02

Training separate experts avoids negative transfer and catastrophic forgetting.

03

Experts can be combined for compositional capabilities.

Abstract

Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks. Previous work has shown that scaling the number of training tasks is the key component in making stronger MT LMs. In this work, we report an unexpected finding that an expert LM fine-tuned on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by a mean accuracy of 3.20% and 1.29%, respectively. This finding casts doubt on the previously held belief that simply scaling the number of tasks makes stronger MT LMs. Leveraging this finding, we further show that this distributed approach of training a separate expert LM per training task instead of a single MT LM for zero-shot inference possesses many benefits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification