Parameter-Efficient Transfer Learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone,, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

TL;DR
This paper introduces adapter modules for transfer learning in NLP, enabling efficient multi-task learning by adding minimal parameters per task while maintaining near state-of-the-art performance.
Contribution
The paper proposes adapter modules as a parameter-efficient alternative to full fine-tuning for NLP transfer learning, demonstrating their effectiveness across multiple tasks.
Findings
Adapters achieve within 0.4% of full fine-tuning performance on GLUE.
Adapters add only 3.6% of parameters per task.
High parameter sharing enables efficient multi-task learning.
Abstract
Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.4% of the performance of full fine-tuning, adding only 3.6%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗FZH1996/fed-loramodel
- 🤗hywu/Camelidae-8x7Bmodel· 1.4k dl· ♡ 151.4k dl♡ 15
- 🤗hywu/Camelidae-8x13Bmodel· 1.4k dl· ♡ 51.4k dl♡ 5
- 🤗hywu/Camelidae-8x34Bmodel· 1.5k dl· ♡ 291.5k dl♡ 29
- 🤗hywu/Qwen2idae-16x14B-v1.0model· 17 dl· ♡ 917 dl♡ 9
- 🤗AdapterHub/llama2-7b-qadapter-seq-openassistantmodel· 2 dl2 dl
- 🤗moorebrett0/microformermodel· ♡ 1♡ 1
- 🤗thomas-schweich/pawn-smallmodel· 464 dl464 dl
- 🤗thomas-schweich/pawn-basemodel· 626 dl626 dl
- 🤗thomas-schweich/pawn-largemodel· 438 dl438 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
