Parameter-efficient Multi-task Fine-tuning for Transformers via Shared   Hypernetworks

Rabeeh Karimi Mahabadi; Sebastian Ruder; Mostafa Dehghani; James; Henderson

arXiv:2106.04489·cs.CL·June 9, 2021

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James, Henderson

PDF

1 Repo

TL;DR

This paper introduces a hypernetwork-based multi-task fine-tuning method for transformers that shares adapter parameters across tasks, improving performance with minimal additional parameters.

Contribution

It proposes a shared hypernetwork approach to generate task-specific adapters, enabling efficient multi-task learning and domain generalization in transformers.

Findings

01

Achieves state-of-the-art multi-task performance on GLUE with only 0.29% parameter increase per task.

02

Demonstrates significant improvements in few-shot domain adaptation.

03

Enables sharing knowledge across tasks via hypernetworks while maintaining task-specific adaptation.

Abstract

State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing information across tasks. In this paper, we show that we can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model. This parameter-efficient multi-task learning framework allows us to achieve the best of both worlds by sharing knowledge across tasks via hypernetworks while enabling the model to adapt to each individual task through task-specific adapters. Experiments on the well-known GLUE benchmark show improved performance in multi-task learning while adding only 0.29% parameters per task. We additionally demonstrate substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rabeehk/hyperformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAdapter