Comparative Analysis of Efficient Adapter-Based Fine-Tuning of State-of-the-Art Transformer Models
Saad Mashkoor Siddiqui, Mohammad Ali Sheikh, Muhammad Aleem, Kajol R, Singh

TL;DR
This paper compares various adapter architectures on transformer models for NLP tasks, showing they often match or outperform fine-tuning while reducing training time, thus offering efficient alternatives for model adaptation.
Contribution
It provides a comprehensive comparison of adapter architectures across multiple transformer models and tasks, highlighting their efficiency and effectiveness as alternatives to fine-tuning.
Findings
Adapters achieve comparable or better performance than fine-tuning.
Adapters significantly reduce training time.
Results are consistent across different NLP tasks.
Abstract
In this work, we investigate the efficacy of various adapter architectures on supervised binary classification tasks from the SuperGLUE benchmark as well as a supervised multi-class news category classification task from Kaggle. Specifically, we compare classification performance and time complexity of three transformer models, namely DistilBERT, ELECTRA, and BART, using conventional fine-tuning as well as nine state-of-the-art (SoTA) adapter architectures. Our analysis reveals performance differences across adapter architectures, highlighting their ability to achieve comparable or better performance relative to fine-tuning at a fraction of the training time. Similar results are observed on the new classification task, further supporting our findings and demonstrating adapters as efficient and flexible alternatives to fine-tuning. This study provides valuable insights and guidelines for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Structural Health Monitoring Techniques · Magnetic Properties and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dropout · Byte Pair Encoding · Linear Layer · Weight Decay · BART · Multi-Head Attention · BERT · Layer Normalization
