IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Wei Zhu; Aaron Xuxiang Tian; Congrui Yin; Yuan Ni; Xiaoling Wang,; Guotong Xie

arXiv:2405.18203·cs.CL·June 10, 2024·1 cites

IAPT: Instruction-Aware Prompt Tuning for Large Language Models

Wei Zhu, Aaron Xuxiang Tian, Congrui Yin, Yuan Ni, Xiaoling Wang,, Guotong Xie

PDF

Open Access

TL;DR

IAPT introduces a novel instruction-aware prompt tuning method that uses only four soft tokens, generating input-specific prompts via layer-wise prompt generators, outperforming recent baselines and being more efficient than LoRA.

Contribution

The paper proposes a new prompt tuning approach with a soft prompt generator at each Transformer layer, requiring only four tokens and automatically learning activation functions.

Findings

01

Outperforms recent baselines with similar parameters

02

More efficient than LoRA in multi-tenant settings

03

Effective across various tasks

Abstract

Soft prompt tuning is a widely studied parameter-efficient fine-tuning method. However, it has a clear drawback: many soft tokens must be inserted into the input sequences to guarantee downstream performance. As a result, soft prompt tuning is less considered than Low-rank adaptation (LoRA) in the large language modeling (LLM) era. In this work, we propose a novel prompt tuning method, Instruction-Aware Prompt Tuning (IAPT), that requires only four soft tokens. First, we install a parameter-efficient soft prompt generator at each Transformer layer to generate idiosyncratic soft prompts for each input instruction. The generated soft prompts can be seen as a semantic summary of the input instructions and can effectively guide the output generation. Second, the soft prompt generators are modules with a bottleneck architecture consisting of a self-attention pooling operation, two linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections