AdapterHub: A Framework for Adapting Transformers

Jonas Pfeiffer; Andreas R\"uckl\'e; Clifton Poth; Aishwarya Kamath,; Ivan Vuli\'c; Sebastian Ruder; Kyunghyun Cho; Iryna Gurevych

arXiv:2007.07779·cs.CL·October 7, 2020

AdapterHub: A Framework for Adapting Transformers

Jonas Pfeiffer, Andreas R\"uckl\'e, Clifton Poth, Aishwarya Kamath,, Ivan Vuli\'c, Sebastian Ruder, Kyunghyun Cho, Iryna Gurevych

PDF

5 Repos 10 Models

TL;DR

AdapterHub is a framework that simplifies the sharing, integration, and application of adapter layers in pre-trained transformer models, enabling efficient multi-task and multilingual NLP adaptations without full model fine-tuning.

Contribution

It introduces a comprehensive framework built on HuggingFace Transformers for dynamic adapter integration, facilitating easy sharing and task-specific adaptation of large pre-trained models.

Findings

01

Enables quick and seamless adaptation of models across tasks and languages.

02

Supports sharing and integrating adapters with minimal code changes.

03

Facilitates scalable, low-resource NLP applications.

Abstract

The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters -- small learnt bottleneck layers inserted within each layer of a pre-trained model -- ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic "stitching-in" of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Attention Dropout · Adam · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Layer Normalization