Adaptable Adapters
Nafise Sadat Moosavi, Quentin Delfosse, Kristian Kersting, Iryna, Gurevych

TL;DR
This paper introduces adaptable adapters for pretrained NLP models that customize activation functions and select beneficial layers, resulting in more efficient and effective fine-tuning, especially in low-data scenarios.
Contribution
The work proposes adaptable adapters with learnable activation functions and layer switches, improving efficiency and transferability over standard adapters.
Findings
Achieve comparable performance with fewer adapter layers.
Transfer well across different data settings and tasks.
Require about 50% of the parameters of standard adapters.
Abstract
State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of pretrained weights. Adapter layers are initialized randomly. However, existing work uses the same adapter architecture -- i.e., the same adapter layer on top of each layer of the pretrained model -- for every dataset, regardless of the properties of the dataset or the amount of available training data. In this work, we introduce adaptable adapters that contain (1) learning different activation functions for different layers and different input data, and (2) a learnable switch to select and only use the beneficial adapter layers. We show that adaptable adapters achieve on-par performances with the standard adapter architecture while using a considerably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Natural Language Processing Techniques
MethodsAdapter
