HyperPrompt: Prompt-based Task-Conditioning of Transformers

Yun He; Huaixiu Steven Zheng; Yi Tay; Jai Gupta; Yu Du; Vamsi; Aribandi; Zhe Zhao; YaGuang Li; Zhao Chen; Donald Metzler; Heng-Tze Cheng; Ed; H. Chi

arXiv:2203.00759·cs.CL·June 16, 2022·29 cites

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi, Aribandi, Zhe Zhao, YaGuang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed, H. Chi

PDF

Open Access

TL;DR

HyperPrompt introduces a novel prompt-based task-conditioning architecture using HyperNetworks for Transformers, enabling efficient multi-task learning with minimal additional parameters and outperforming existing methods on NLP benchmarks.

Contribution

The paper proposes HyperPrompt, a new architecture that uses HyperNetworks to generate hyper-prompts for task conditioning, improving multi-task learning efficiency and performance.

Findings

01

HyperPrompt achieves superior results on GLUE and SuperGLUE benchmarks.

02

It uses only 0.14% additional parameters for task conditioning.

03

HyperPrompt outperforms Prompt-Tuning and HyperFormer++ baselines.

Abstract

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · HyperNetwork · Dropout · Gated Linear Unit · Dense Connections · Softmax