Universality and Limitations of Prompt Tuning
Yihan Wang, Jatin Chauhan, Wei Wang, Cho-Jui Hsieh

TL;DR
This paper investigates the theoretical capabilities and limitations of prompt tuning in transformer models, establishing conditions for universality and identifying scenarios where prompt tuning fails, supported by empirical evidence.
Contribution
It provides the first theoretical analysis of prompt tuning's universality and limitations in transformer architectures, including bounds on parameters and conditions for dataset memorization.
Findings
Prompt tuning can approximate any Lipschitz function with a strong transformer.
Limited-depth transformers cannot memorize certain datasets with any prompt length.
A lower bound on prompt parameters is established, comparing with low-rank updates.
Abstract
Despite the demonstrated empirical efficacy of prompt tuning to adapt a pretrained language model for a new task, the theoretical underpinnings of the difference between "tuning parameters before the input" against "the tuning of model weights" are limited. We thus take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures. By considering a general purpose architecture, we analyze prompt tuning from the lens of both: universal approximation and limitations with finite-depth fixed-weight pretrained transformers for continuous-valued functions. Our universality result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions. The limitations of prompt tuning for limited-depth transformers are first proved by constructing a set of datasets, that cannot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling · Neural Networks and Applications
