PrefPO: Pairwise Preference Prompt Optimization
Rahul Singhal, Pradyumna Tambwekar, Karime Maamari

TL;DR
PrefPO is a reinforcement learning-inspired method for automated prompt optimization that reduces the need for labeled data, improves prompt quality, and mitigates prompt hacking, achieving state-of-the-art results on challenging tasks.
Contribution
PrefPO introduces a preference-based, minimally supervised prompt optimization approach that outperforms existing methods and enhances prompt hygiene and robustness.
Findings
PrefPO matches or exceeds SOTA on 6/9 tasks.
PrefPO reduces prompt length and repetition by 3-5x.
PrefPO is less susceptible to prompt hacking than TextGrad.
Abstract
Prompt engineering is effective but labor-intensive, motivating automated optimization methods. Existing methods typically require labeled datasets, which are often unavailable, and produce verbose, repetitive prompts. We introduce PrefPO, a minimal prompt optimization approach inspired by reinforcement learning from human feedback (RLHF). Its preference-based approach reduces the need for labeled data and hyperparameter tuning-only a starting prompt and natural language criteria are needed. PrefPO uses an LLM discriminator to express pairwise preferences over model outputs and provide feedback to an LLM optimizer, iteratively improving performance. We evaluate PrefPO on 9 BIG-Bench Hard (BBH) tasks and IFEval-Hard, a newly-curated, challenging subset of IFEval. PrefPO matches or exceeds SOTA methods, including GEPA, MIPRO, and TextGrad, on 6/9 tasks and performs comparably to TextGrad…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics
