Large Language Models Are Human-Level Prompt Engineers

Yongchao Zhou; Andrei Ioan Muresanu; Ziwen Han; Keiran Paster; Silviu; Pitis; Harris Chan; Jimmy Ba

arXiv:2211.01910·cs.LG·March 13, 2023·297 cites

Large Language Models Are Human-Level Prompt Engineers

Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu, Pitis, Harris Chan, Jimmy Ba

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces Automatic Prompt Engineer (APE), an automated method for generating and selecting effective prompts for large language models, significantly reducing human effort and improving task performance across multiple NLP benchmarks.

Contribution

We propose APE, a novel approach that automatically generates and optimizes prompts for LLMs, outperforming prior methods and matching human-designed prompts on many tasks.

Findings

01

APE outperforms prior LLM baseline prompts.

02

APE matches or exceeds human-designed prompts on 19/24 tasks.

03

Prompts generated by APE improve truthfulness, informativeness, and few-shot learning.

Abstract

By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the "program," optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Experiments on 24 NLP tasks show that our automatically generated instructions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Large Language Models are Human-Level Prompt Engineers· slideslive

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Explainable Artificial Intelligence (XAI)