Task Facet Learning: A Structured Approach to Prompt Optimization

Gurusha Juneja; Gautam Jajoo; Nagarajan Natarajan; Hua Li; Jian Jiao; Amit Sharma

arXiv:2406.10504·cs.AI·May 20, 2025

Task Facet Learning: A Structured Approach to Prompt Optimization

Gurusha Juneja, Gautam Jajoo, Nagarajan Natarajan, Hua Li, Jian Jiao, Amit Sharma

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces UniPrompt, a structured algorithm for prompt optimization that learns multiple task facets from training data, resulting in more accurate and complex prompts than existing methods.

Contribution

UniPrompt is a novel algorithm that leverages task structure and clustering to generate comprehensive prompts by capturing multiple facets, outperforming prior prompt tuning approaches.

Findings

01

UniPrompt achieves higher accuracy than human-tuned prompts.

02

It can generate longer, more complex prompts.

03

Outperforms state-of-the-art prompt methods.

Abstract

Given a task in the form of a basic description and its training examples, prompt optimization is the problem of synthesizing the given information into a text prompt for a large language model. Humans solve this problem by also considering the different facets that define a task (e.g., counter-examples, explanations, analogies) and including them in the prompt. However, it is unclear whether existing algorithmic approaches, based on iteratively editing a given prompt or automatically selecting a few in-context examples, can cover the multiple facets required to solve a complex task. In this work, we view prompt optimization as that of learning multiple facets of a task from a set of training examples. We exploit structure in the prompt optimization problem and break down a prompt into loosely coupled semantic sections. The proposed algorithm, UniPrompt, (1) clusters the input space and…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

The strategy for creating mini-batches to generate edits and then aggregating them at the batch level to yield the final edit in a feedback mechanism is interesting. Experimental results achieves SOTA.

Weaknesses

1.The paper contains some detailed issues, e.g., $U_{ni}P_{rompt}$ mentioned in line 100-101; an extra space before "So" mentioned in line 197-198. 2. The experiments should be re-organized and polished carefully before submission, especially in Section 4.1, 4.2 and 4.5. 3. Some qualitative experiments are supported be added to validate the differences in report accuracies under various mini-batch sizes.

Reviewer 02Rating 5Confidence 3

Strengths

- UNIPROMPT clusters examples to capture task-specific facets effectively, with a two-tier feedback mechanism ensuring that prompt modifications are generalizable and not overly specific to individual examples. - The authors provides strong empirical results, with UNIPROMPT outperforming state-of-the-art methods on multiple datasets, notably achieving significant improvements on hate speech classification. - By focusing on automatic prompt generation without relying heavily on human-engineered p

Weaknesses

- All experiments are conducted on powerful proprietary LLMs, I wonder whether the proposed method remains effective for opensource LLMs such as LLaMA or Qwen. - It seems that the proposed method underperforms DSPy on MedQA, while surpassing it on other benchmarks, what might be the reason? - How to determine the number of clusters when constructing the minibatches? How does this affect the method's effectiveness?

Reviewer 03Rating 6Confidence 4

Strengths

(1) The paper introduces a new method for prompt optimization and consider multiple facets of a task and leveraging the structure in the prompt. (2) The proposed clustering approach and two-tier feedback mechanism provide a systematic way to generate prompt edits that capture generalizable concepts. (3) The experimental results on several datasets shows that UNIPROMPT outperforms human-tuned prompts and SOTA methods.

Weaknesses

(1) I am skeptical about the current importance of prompt optimization. Prompt optimization is more of a need in an era when large models are not powerful (for example last year). Nowadays, the instruction model follows human instructions very well, and many of them are 0-shot, see llama-3.1 model card. I would like to hear the author's opinion on this. (2) For the results in table 1, it is recommended to add an average column to make the presentation clearer. (3) I found that on the ARC task,

Videos

Task Facet Learning: A Structured Approach To Prompt Optimization· underline

Taxonomy

TopicsReservoir Engineering and Simulation Methods · AI-based Problem Solving and Planning · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training