Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li, Percy Liang

TL;DR
Prefix-tuning is a lightweight method that optimizes a small continuous vector to adapt large language models for generation tasks, achieving comparable or better results than fine-tuning with fewer parameters.
Contribution
The paper introduces prefix-tuning, a novel approach that keeps model parameters frozen and optimizes a small prefix, reducing storage and improving low-data and unseen-topic performance.
Findings
Achieves comparable performance to fine-tuning with only 0.1% of parameters.
Outperforms fine-tuning in low-data scenarios.
Generalizes better to unseen topics.
Abstract
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗rinna/japanese-gpt-neox-smallmodel· 332k dl· ♡ 15332k dl♡ 15
- 🤗crumb/ptune-FLAN-OPT-2.7bmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗crumb/ptune-FLAN-OPT-6.7bmodel· 4 dl· ♡ 24 dl♡ 2
- 🤗FZH1996/fed-loramodel
- 🤗RichardErkhov/rinna_-_japanese-gpt-neox-small-4bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/rinna_-_japanese-gpt-neox-small-8bitsmodel
- 🤗RichardErkhov/rinna_-_japanese-gpt-neox-small-awqmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Multi-Head Attention · Attention Is All You Need · Attention Dropout · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia?
