A Bayesian approach for prompt optimization in pre-trained language models
Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani,, Francesco Archetti

TL;DR
This paper introduces a Bayesian optimization method for hard prompt tuning in pre-trained language models, enabling efficient discrete prompt selection for classification tasks without requiring access to the full model.
Contribution
It formulates prompt optimization as a combinatorial problem and applies Bayesian optimization in a continuous embedding space, suitable for black-box LLMs like GPT-4.
Findings
Bayesian optimization performs well across multiple benchmarks
Tradeoff analysis between search space size, accuracy, and time
Effective for discrete prompt tuning without model access
Abstract
A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
Methodstravel james · Lib · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax
