DelvePO: Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization
Tao Tao, Guanghui Zhu, Lang Guo, Hongyi Chen, Chunfeng Yuan, Yihua Huang

TL;DR
DelvePO is a flexible, self-evolving prompt optimization framework that decouples prompt components, uses working memory to guide prompt generation, and outperforms previous methods across diverse tasks and models.
Contribution
It introduces a task-agnostic, direction-guided self-evolving framework for prompt optimization that improves transferability and stability over existing approaches.
Findings
DelvePO outperforms SOTA methods on various tasks.
It demonstrates high transferability across different LLMs.
The framework effectively alleviates prompt instability.
Abstract
Prompt Optimization has emerged as a crucial approach due to its capabilities in steering Large Language Models to solve various tasks. However, current works mainly rely on the random rewriting ability of LLMs, and the optimization process generally focus on specific influencing factors, which makes it easy to fall into local optimum. Besides, the performance of the optimized prompt is often unstable, which limits its transferability in different tasks. To address the above challenges, we propose (irection-Guidd Sef-Eolving Framework for Flxible rompt ptimization), a task-agnostic framework to optimize prompts in self-evolve manner. In our framework, we decouple prompts into different components that can be used to explore the impact that different factors may have on various tasks.…
Peer Reviews
Decision·Submitted to ICLR 2026
- Decomposing prompts into interpretable components is valuable for understanding what makes prompts effective - Testing across multiple LLMs and domains demonstrates effort to validate generalizability - Detailed appendices with all prompts used enhance transparency - The working memory design that stores both component-level and prompt-level information is sensible
- The core contributions are incremental improvements over existing evolutionary prompt optimization methods - Lack of significance testing and inconsistent use of random seeds weakens confidence in reported improvements - The framework requires extensive prompt engineering (Sub-tasks I-II, Sub-solutions I-II, multiple scenarios) that may limit adoption - Practical Limitations: 1. Higher computational costs than baselines 2. Requires predefined component types that may not transfer across domai
* DelvePO achieves better performance compared with previous baselines and ablation study shows the effectiveness of each component in the method. * This paper is well-written and the motivation is clear, meaningful.
* Datasets and tasks selected are classical, relatively easy tasks for LLMs and these are not difficult for current strong LLMs anymore. I'm curious about the performance of DelvePO on more challenging and difficult tasks in LLM-era, like GSM8k, BBH, more reasoning tasks and so on. * This paper introduces memory and in essence, memory appears as concluding insights from last-generation prompts, which is a little far-fetched. OPRO[1] gives previous good-performing prompts and worse prompts to gu
The paper introduces a clear component-level prompt representation together with two explicit working memories (Component Memory and Prompt Memory), which turns otherwise highly stochastic LLM-based prompt mutation into a more controllable and reusable optimization process.
1. The paper proposes a task-agnostic framework, but the initial component pool is manually collected and constructed from a wide range of related literature (line 116). This raises a question about the motivation of the method: does DelvePO truly make no strong task-specific assumptions and generalize to different tasks because of the framework design itself, or is the observed generality mainly due to the fact that a very comprehensive task component pool has been pre-collected and constructed
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques
