Tournament of Prompts: Evolving LLM Instructions Through Structured Debates and Elo Ratings
Anirudh Nair, Adi Banerjee, Laurent Mombaerts, Matthew Hagen, Tarik Borogovac

TL;DR
This paper introduces DEEVO, a novel debate-driven evolutionary framework that optimizes prompts for large language models using Elo ratings, enabling effective exploration and improvement without predefined metrics.
Contribution
DEEVO is the first to combine debate-based evaluation with evolutionary prompt optimization, allowing for semantic-preserving exploration and outperformance of existing methods.
Findings
DEEVO outperforms manual prompt engineering.
It surpasses state-of-the-art optimization methods.
Effective on both open-ended and close-ended tasks.
Abstract
Prompt engineering represents a critical bottleneck to harness the full potential of Large Language Models (LLMs) for solving complex tasks, as it requires specialized expertise, significant trial-and-error, and manual intervention. This challenge is particularly pronounced for tasks involving subjective quality assessment, where defining explicit optimization objectives becomes fundamentally problematic. Existing automated prompt optimization methods falter in these scenarios, as they typically require well-defined task-specific numerical fitness functions or rely on generic templates that cannot capture the nuanced requirements of complex use cases. We introduce DEEVO (DEbate-driven EVOlutionary prompt optimization), a novel framework that guides prompt evolution through a debate-driven evaluation with an Elo-based selection. Contrary to prior work, DEEVOs approach enables exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations
