PromptWizard: Task-Aware Prompt Optimization Framework

Eshaan Agarwal; Joykirat Singh; Vivek Dani; Raghav Magazine; Tanuja; Ganu; and Akshay Nambi

arXiv:2405.18369·cs.CL·October 4, 2024·2 cites

PromptWizard: Task-Aware Prompt Optimization Framework

Eshaan Agarwal, Joykirat Singh, Vivek Dani, Raghav Magazine, Tanuja, Ganu, and Akshay Nambi

PDF

Open Access 1 Repo 1 Models 3 Reviews

TL;DR

PromptWizard is an automated, self-adapting framework that optimizes prompts for large language models, improving task performance efficiently with minimal data and cost.

Contribution

It introduces a novel self-evolving prompt optimization method that systematically refines prompts for diverse tasks, reducing manual effort and resource usage.

Findings

01

Achieves superior performance on 45 tasks

02

Reduces API calls and token usage significantly

03

Effective with limited data and smaller models

Abstract

Large language models (LLMs) have transformed AI across diverse domains, with prompting being central to their success in guiding model outputs. However, manual prompt engineering is both labor-intensive and domain-specific, necessitating the need for automated solutions. We introduce PromptWizard, a novel, fully automated framework for discrete prompt optimization, utilizing a self-evolving, self-adapting mechanism. Through a feedback-driven critique and synthesis process, PromptWizard achieves an effective balance between exploration and exploitation, iteratively refining both prompt instructions and in-context examples to generate human-readable, task-specific prompts. This guided approach systematically improves prompt quality, resulting in superior performance across 45 tasks. PromptWizard excels even with limited training data, smaller LLMs, and various LLM architectures.…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

1. The proposed system appears pretty extensive and combines several techniques for discrete prompt engineering. 1. The experimental evaluation is well done and the results show significant wins in accuracy and cost over existing systems. 1. The ablation points to potentially significant wins from their method to construct few-shot chain-of-thought reasoning examples.

Weaknesses

1. My primary concern with this paper is that while the results are impressive, I am struggling to identify the key-insight or transferable idea. What makes PromptWizard better than all the systems it beats? There is an ablation but it is one paragraph and seems to suggest that reasoning and few shot examples is primary source of improvement. 2. While the writing was reasonable, the fairly complex pipeline of evolutionary optimizers and sequential optimization made it difficult to understand

Reviewer 02Rating 5Confidence 4

Strengths

1. The method PW proposed in this paper optimizes both the prompt instructions and in-context examples contained in the discrete prompt, which achieves a comprehensive approach and considerable performance gains in experiments. 2. Compared to recent LLM-based prompt optimization methods, PW achieves satisfying results with much less cost of API calls.

Weaknesses

1. Although the authors have written about the difference between PW and existing prompt optimization methods. The technical novelty of the proposed method is still somewhat weak. The approach of refining prompt instructions and selecting the most effective in-context examples are widely recognized concepts. 2. The presentation of the overall pipeline of the approach could be improved. For example, moving the main algorithm in the appendix into the main text will make the presentation clearer an

Reviewer 03Rating 3Confidence 4

Strengths

The optimization of head prompt and few-shot example prompt together is one novel component. I have not seen this setting in other papers before. The experiments over BBH and GSM8K and some ablation studies are appreciated.

Weaknesses

1) Overclaim of the innovation part. Actually, integrating error feedback for LLMs to refine the prompts has already been proposed in other works, such as APO (https://openreview.net/pdf?id=WRYhaSrThy), PromptAgent (https://arxiv.org/pdf/2310.16427) and PROMST (https://arxiv.org/pdf/2402.08702), etc. 2) Following above, the authors only compare with baselines like Evoprompt, PromptBreeder, and APE, which truly do not use error feedback. However, why not compare with works using error feedback su

Code & Models

Repositories

microsoft/promptwizard
none

Models

🤗
muhammedAdnan3/PromptWizardCornai
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management · Business Process Modeling and Analysis