Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li; Percy Liang

arXiv:2101.00190·cs.CL·January 5, 2021·291 cites

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Xiang Lisa Li, Percy Liang

PDF

Open Access 5 Repos 7 Models

TL;DR

Prefix-tuning is a lightweight method that optimizes a small continuous vector to adapt large language models for generation tasks, achieving comparable or better results than fine-tuning with fewer parameters.

Contribution

The paper introduces prefix-tuning, a novel approach that keeps model parameters frozen and optimizes a small prefix, reducing storage and improving low-data and unseen-topic performance.

Findings

01

Achieves comparable performance to fine-tuning with only 0.1% of parameters.

02

Outperforms fine-tuning in low-data scenarios.

03

Generalizes better to unseen topics.

Abstract

Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Multi-Head Attention · Attention Is All You Need · Attention Dropout · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia?