TL;DR
This paper introduces a bandit-based framework for multi-objective prompt optimization in large language models, addressing the challenge of multi-faceted performance metrics with theoretical guarantees and practical efficiency.
Contribution
It adapts multi-objective bandit algorithms for prompt selection, proposing a novel approach with theoretical guarantees and demonstrating significant empirical improvements.
Findings
Bandit-based methods outperform baselines in multi-objective prompt selection.
The proposed algorithms have provable guarantees on identification error.
Experiments across multiple LLMs validate the framework's effectiveness.
Abstract
Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection -- efficiently identifying the most effective prompts. However, most prior investigations overlook a key challenge: the inherently multi-faceted nature of prompt performance, which cannot be captured by a single metric. To fill this gap, we study the multi-objective prompt selection problem under two practical settings: Pareto prompt set recovery and best feasible prompt identification. Casting the problem into the pure-exploration bandits framework, we adapt provably efficient algorithms from multi-objective bandits and further introduce a novel design for best feasible arm identification in structured bandits, with theoretical guarantees on the identification error in the linear case. Extensive experiments across multiple LLMs show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
