$p1$: Better Prompt Optimization with Fewer Prompts
Zhaolin Gao, Yu (Sid) Wang, Bo Liu, Thorsten Joachims, Kiant\'e Brantley, Wen Sun

TL;DR
This paper introduces $p1$, a prompt filtering method that enhances prompt optimization by selecting high-variance user prompts, leading to better system prompts especially on reasoning benchmarks.
Contribution
The paper reveals how prompt optimization effectiveness depends on prompt and response variance and proposes $p1$, a simple filtering approach that improves optimization outcomes.
Findings
Scaling to more prompts can reduce variance among prompts, hindering optimization.
Filtering prompts to select high-variance prompts improves prompt optimization.
Training on just two prompts can produce a system prompt that generalizes well.
Abstract
Prompt optimization improves language models without updating their weights by searching for a better system prompt, but its effectiveness varies widely across tasks. We study what makes a task amenable to prompt optimization. We show that the reward variance across different system prompts can be decomposed into two components: variance among responses, which captures generation stochasticity, and variance among system prompts, which captures differences in system prompt quality. Prompt optimization succeeds when variance among system prompts is sufficiently large, but fails when variance among responses dominates the variance of the system prompts. Surprisingly, we further show that scaling to more user prompts can hurt optimization by reducing variance among system prompts, especially on heterogeneous datasets where different user prompts favor different system prompts. Motivated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
