TL;DR
AdapterDrop is a method that improves transformer efficiency by removing adapters from lower layers during training and inference, reducing computational costs with minimal performance loss.
Contribution
This paper introduces AdapterDrop, a novel approach that dynamically prunes adapters in transformers to enhance efficiency across multiple tasks.
Findings
Reduces inference computational cost significantly.
Maintains task performance with adapter pruning.
Enhances multi-task inference efficiency.
Abstract
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
