Parameter-Efficient Fine-Tuning without Introducing New Latency
Baohao Liao, Yan Meng, Christof Monz

TL;DR
This paper introduces a parameter-efficient fine-tuning method that uses a shared, task-agnostic sparse mask and a novel adapter technique, achieving state-of-the-art results without increasing inference latency or storage requirements.
Contribution
It proposes a new PEFT approach with a shared sparse mask and direct adapter application, improving performance and efficiency without added latency.
Findings
Surpasses existing PEFT methods on GLUE benchmark
Stores only 0.03% of parameters compared to full fine-tuning
Achieves state-of-the-art performance in efficiency and accuracy
Abstract
Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently demonstrated remarkable achievements, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters, and consequently addressing the storage and communication constraints. Nonetheless, various PEFT methods are limited by their inherent characteristics. In the case of sparse fine-tuning, which involves modifying only a small subset of the existing parameters, the selection of fine-tuned parameters is task- and domain-specific, making it unsuitable for federated learning. On the other hand, PEFT methods with adding new parameters typically introduce additional inference latency. In this paper, we demonstrate the feasibility of generating a sparse mask in a task-agnostic manner, wherein all downstream tasks share a common mask. Our approach, which relies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
MethodsAdapter · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
