A Bayesian approach for prompt optimization in pre-trained language   models

Antonio Sabbatella; Andrea Ponti; Antonio Candelieri; Ilaria Giordani,; Francesco Archetti

arXiv:2312.00471·cs.LG·December 4, 2023·1 cites

A Bayesian approach for prompt optimization in pre-trained language models

Antonio Sabbatella, Andrea Ponti, Antonio Candelieri, Ilaria Giordani,, Francesco Archetti

PDF

Open Access

TL;DR

This paper introduces a Bayesian optimization method for hard prompt tuning in pre-trained language models, enabling efficient discrete prompt selection for classification tasks without requiring access to the full model.

Contribution

It formulates prompt optimization as a combinatorial problem and applies Bayesian optimization in a continuous embedding space, suitable for black-box LLMs like GPT-4.

Findings

01

Bayesian optimization performs well across multiple benchmarks

02

Tradeoff analysis between search space size, accuracy, and time

03

Effective for discrete prompt tuning without model access

Abstract

A prompt is a sequence of symbol or tokens, selected from a vocabulary according to some rule, which is prepended/concatenated to a textual query. A key problem is how to select the sequence of tokens: in this paper we formulate it as a combinatorial optimization problem. The high dimensionality of the token space com-pounded by the length of the prompt sequence requires a very efficient solution. In this paper we propose a Bayesian optimization method, executed in a continuous em-bedding of the combinatorial space. In this paper we focus on hard prompt tuning (HPT) which directly searches for discrete tokens to be added to the text input with-out requiring access to the large language model (LLM) and can be used also when LLM is available only as a black-box. This is critically important if LLMs are made available in the Model as a Service (MaaS) manner as in GPT-4. The current…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

Methodstravel james · Lib · Multi-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax