Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization

Ximing Dong; Shaowei Wang; Dayi Lin; Ahmed E. Hassan

arXiv:2505.10736·cs.CL·August 20, 2025

Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization

Ximing Dong, Shaowei Wang, Dayi Lin, Ahmed E. Hassan

PDF

Open Access 1 Video

TL;DR

This paper introduces IPOMP, a novel method for selecting evaluation data for prompt optimization in large language models, improving reliability and efficiency through real-time performance feedback and semantic clustering.

Contribution

IPOMP is a two-stage, real-time, performance-guided data selection method that enhances prompt optimization by addressing limitations of existing coreset techniques.

Findings

01

IPOMP improves prompt optimization effectiveness by up to 5.3%.

02

IPOMP increases stability of evaluation results by at least 57%.

03

The method incurs minimal computational overhead below 1%.

Abstract

Optimizing Large Language Model (LLM) performance requires well-crafted prompts, but manual prompt engineering is labor-intensive and often ineffective. Automated prompt optimization techniques address this challenge but the majority of them rely on randomly selected evaluation subsets, which fail to represent the full dataset, leading to unreliable evaluations and suboptimal prompts. Existing coreset selection methods, designed for LLM benchmarking, are unsuitable for prompt optimization due to challenges in clustering similar samples, high data collection costs, and the unavailability of performance data for new or private datasets. To overcome these issues, we propose IPOMP, an Iterative evaluation data selection for effective Prompt Optimization using real-time Model Performance. IPOMP is a two-stage approach that selects representative and diverse samples using semantic clustering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification