TL;DR
P3 is a self-improvement framework that jointly optimizes system and user prompts for large language models, leading to improved performance across various tasks by considering prompt interdependencies.
Contribution
This work introduces P3, a novel iterative framework for simultaneous optimization of system and user prompts, advancing beyond unilateral prompt tuning methods.
Findings
P3 outperforms existing prompt optimization methods on multiple benchmarks.
Joint prompt optimization improves LLM performance more than individual prompt tuning.
The framework is effective across general and reasoning tasks.
Abstract
Current large language model (LLM) applications often employ multi-component prompts, comprising both system and user prompts, to guide model behaviors. While recent advancements have demonstrated the efficacy of automatically optimizing either the system or user prompt to boost performance, such unilateral approaches often yield suboptimal outcomes due to the interdependent nature of these components. In this work, we introduce P3, a novel self-improvement framework that concurrently optimizes both system and user prompts through an iterative process. The offline optimized prompts are further leveraged to promote online prompting by performing query-dependent prompt optimization. Extensive experiments on general tasks (e.g., Arena-hard and Alpaca-eval) and reasoning tasks (e.g., GSM8K and GPQA) demonstrate that P3 achieves superior performance in the realm of automatic prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
