PrefPO: Pairwise Preference Prompt Optimization

Rahul Singhal; Pradyumna Tambwekar; Karime Maamari

arXiv:2603.19311·cs.CL·March 26, 2026

PrefPO: Pairwise Preference Prompt Optimization

Rahul Singhal, Pradyumna Tambwekar, Karime Maamari

PDF

Open Access 1 Datasets

TL;DR

PrefPO is a reinforcement learning-inspired method for automated prompt optimization that reduces the need for labeled data, improves prompt quality, and mitigates prompt hacking, achieving state-of-the-art results on challenging tasks.

Contribution

PrefPO introduces a preference-based, minimally supervised prompt optimization approach that outperforms existing methods and enhances prompt hygiene and robustness.

Findings

01

PrefPO matches or exceeds SOTA on 6/9 tasks.

02

PrefPO reduces prompt length and repetition by 3-5x.

03

PrefPO is less susceptible to prompt hacking than TextGrad.

Abstract

Prompt engineering is effective but labor-intensive, motivating automated optimization methods. Existing methods typically require labeled datasets, which are often unavailable, and produce verbose, repetitive prompts. We introduce PrefPO, a minimal prompt optimization approach inspired by reinforcement learning from human feedback (RLHF). Its preference-based approach reduces the need for labeled data and hyperparameter tuning-only a starting prompt and natural language criteria are needed. PrefPO uses an LLM discriminator to express pairwise preferences over model outputs and provide feedback to an LLM optimizer, iteratively improving performance. We evaluate PrefPO on 9 BIG-Bench Hard (BBH) tasks and IFEval-Hard, a newly-curated, challenging subset of IFEval. PrefPO matches or exceeds SOTA methods, including GEPA, MIPRO, and TextGrad, on 6/9 tasks and performs comparably to TextGrad…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

rahul-singhal/IFEval-Hard
dataset· 11 dl
11 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms · Reinforcement Learning in Robotics