Universality and Limitations of Prompt Tuning

Yihan Wang; Jatin Chauhan; Wei Wang; Cho-Jui Hsieh

arXiv:2305.18787·cs.LG·November 17, 2023·1 cites

Universality and Limitations of Prompt Tuning

Yihan Wang, Jatin Chauhan, Wei Wang, Cho-Jui Hsieh

PDF

Open Access 1 Video

TL;DR

This paper investigates the theoretical capabilities and limitations of prompt tuning in transformer models, establishing conditions for universality and identifying scenarios where prompt tuning fails, supported by empirical evidence.

Contribution

It provides the first theoretical analysis of prompt tuning's universality and limitations in transformer architectures, including bounds on parameters and conditions for dataset memorization.

Findings

01

Prompt tuning can approximate any Lipschitz function with a strong transformer.

02

Limited-depth transformers cannot memorize certain datasets with any prompt length.

03

A lower bound on prompt parameters is established, comparing with low-rank updates.

Abstract

Despite the demonstrated empirical efficacy of prompt tuning to adapt a pretrained language model for a new task, the theoretical underpinnings of the difference between "tuning parameters before the input" against "the tuning of model weights" are limited. We thus take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures. By considering a general purpose architecture, we analyze prompt tuning from the lens of both: universal approximation and limitations with finite-depth fixed-weight pretrained transformers for continuous-valued functions. Our universality result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions. The limitations of prompt tuning for limited-depth transformers are first proved by constructing a set of datasets, that cannot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Universality and Limitations of Prompt Tuning· slideslive

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Neural Networks and Applications