TL;DR
This paper introduces a hypernetwork-based multi-task fine-tuning method for transformers that shares adapter parameters across tasks, improving performance with minimal additional parameters.
Contribution
It proposes a shared hypernetwork approach to generate task-specific adapters, enabling efficient multi-task learning and domain generalization in transformers.
Findings
Achieves state-of-the-art multi-task performance on GLUE with only 0.29% parameter increase per task.
Demonstrates significant improvements in few-shot domain adaptation.
Enables sharing knowledge across tasks via hypernetworks while maintaining task-specific adaptation.
Abstract
State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdapter
