Parameter-Efficient Transfer Learning for NLP

Neil Houlsby; Andrei Giurgiu; Stanislaw Jastrzebski; Bruna Morrone,; Quentin de Laroussilhe; Andrea Gesmundo; Mona Attariyan; Sylvain Gelly

arXiv:1902.00751·cs.LG·June 14, 2019·144 cites

Parameter-Efficient Transfer Learning for NLP

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone,, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces adapter modules for transfer learning in NLP, enabling efficient multi-task learning by adding minimal parameters per task while maintaining near state-of-the-art performance.

Contribution

The paper proposes adapter modules as a parameter-efficient alternative to full fine-tuning for NLP transfer learning, demonstrating their effectiveness across multiple tasks.

Findings

01

Adapters achieve within 0.4% of full fine-tuning performance on GLUE.

02

Adapters add only 3.6% of parameters per task.

03

High parameter sharing enables efficient multi-task learning.

Abstract

Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections