TL;DR
This paper offers a Bayesian perspective on prompt tuning and in-context learning, revealing fundamental limits and demonstrating how meta-trained neural networks adapt rapidly, with experiments on LSTMs and Transformers validating the theory.
Contribution
It introduces a Bayesian framework for understanding prompt tuning, explaining the behavior of meta-trained models and the effectiveness of soft prefixes in prompt optimization.
Findings
Meta-trained networks act as Bayesian predictors with rapid in-context adaptation.
Soft prefixes can effectively manipulate activations beyond hard token prompts.
Theoretical criteria determine when optimal prompting is feasible.
Abstract
Prompting is one of the main ways to adapt a pretrained model to target tasks. Besides manually constructing prompts, many prompt optimization methods have been proposed in the literature. Method development is mainly empirically driven, with less emphasis on a conceptual understanding of prompting. In this paper we discuss how optimal prompting can be understood through a Bayesian view, which also implies some fundamental limitations of prompting that can only be overcome by tuning weights. The paper explains in detail how meta-trained neural networks behave as Bayesian predictors over the pretraining distribution, whose hallmark feature is rapid in-context adaptation. Optimal prompting can be studied formally as conditioning these Bayesian predictors, yielding criteria for target tasks where optimal prompting is and is not possible. We support the theory with educational experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
