HyperPrompt: Prompt-based Task-Conditioning of Transformers
Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi, Aribandi, Zhe Zhao, YaGuang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed, H. Chi

TL;DR
HyperPrompt introduces a novel prompt-based task-conditioning architecture using HyperNetworks for Transformers, enabling efficient multi-task learning with minimal additional parameters and outperforming existing methods on NLP benchmarks.
Contribution
The paper proposes HyperPrompt, a new architecture that uses HyperNetworks to generate hyper-prompts for task conditioning, improving multi-task learning efficiency and performance.
Findings
HyperPrompt achieves superior results on GLUE and SuperGLUE benchmarks.
It uses only 0.14% additional parameters for task conditioning.
HyperPrompt outperforms Prompt-Tuning and HyperFormer++ baselines.
Abstract
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · HyperNetwork · Dropout · Gated Linear Unit · Dense Connections · Softmax
