When to Use Multi-Task Learning vs Intermediate Fine-Tuning for   Pre-Trained Encoder Transfer Learning

Orion Weller; Kevin Seppi; Matt Gardner

arXiv:2205.08124·cs.CL·May 18, 2022

When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

Orion Weller, Kevin Seppi, Matt Gardner

PDF

Open Access 1 Repo

TL;DR

This paper compares three transfer learning strategies in NLP, revealing a simple heuristic for choosing between them based on dataset sizes, and demonstrating the superiority of pairwise multi-task learning in most cases.

Contribution

It provides a comprehensive analysis of transfer learning methods on GLUE, introducing a heuristic for method selection based on dataset size, and empirically validating this approach.

Findings

01

Pairwise MTL outperforms STILTs when the target task has fewer instances than the supporting task.

02

The heuristic applies in over 92% of cases on GLUE.

03

MTL-ALL generally performs worse than pairwise methods.

Abstract

Transfer learning (TL) in natural language processing (NLP) has seen a surge of interest in recent years, as pre-trained models have shown an impressive ability to transfer to novel tasks. Three main strategies have emerged for making use of multiple supervised datasets during fine-tuning: training on an intermediate task before training on the target task (STILTs), using multi-task learning (MTL) to train jointly on a supplementary task and the target task (pairwise MTL), or simply using MTL to train jointly on all available datasets (MTL-ALL). In this work, we compare all three TL methods in a comprehensive analysis on the GLUE dataset suite. We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa. We show that this holds true in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

orionw/mtlvsift
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications