Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees
Xiaoyu Ma, Yiwen Li, Haoyue Liu, Zhichao Wang, Ye Chen, Yongxin Guo, and Xiaoying Tang

TL;DR
This paper introduces POES, a submodular, prompt-aware evaluation scheduling method for automatic prompt optimization that improves accuracy and reduces token usage by smarter selection of evaluation examples.
Contribution
The paper proposes a novel submodular scheduling framework, POES, with formal guarantees, that adaptively selects evaluation examples to enhance prompt optimization efficiency.
Findings
POES achieves a 6.2% accuracy improvement over baselines.
It reduces token overhead to approximately 4% of the evaluation budget.
Selecting fewer, smarter examples can match or outperform naive evaluation at higher costs.
Abstract
Automatic prompt optimization (APO) hinges on the quality of its evaluation signal, yet scoring every prompt candidate on the full training set is prohibitively expensive. Existing methods either fix a single evaluation subset before optimization begins (principled but prompt-agnostic) or adapt it heuristically during optimization (flexible but unstable and lacking formal guarantees). We observe that APO naturally maps to an online adaptive testing problem: prompts are examinees, training examples are test items, and the scheduler should select items that best discriminate among the strongest candidates. This insight motivates Prompt-Aware Online Evaluation Scheduling (POES), which integrates an IRT-based discrimination utility, a facility-location coverage term, and switching-cost-aware warm-start swaps into a unified objective that is provably monotone submodular, yielding a (1-1/e)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
