SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback

Yaoning Yu; Ye Yu; Peiyan Zhang; Kai Wei; Haojing Luo; Haohan Wang

arXiv:2505.19514·cs.CL·January 28, 2026

SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback

Yaoning Yu, Ye Yu, Peiyan Zhang, Kai Wei, Haojing Luo, Haohan Wang

PDF

Open Access 3 Reviews

TL;DR

SIPDO introduces a closed-loop framework that iteratively improves prompts for large language models by generating synthetic data to identify weaknesses and refine prompts without external supervision.

Contribution

It presents a novel self-improving prompt optimization method that combines synthetic data generation with prompt refinement in a feedback loop.

Findings

01

Outperforms standard prompt tuning methods on QA and reasoning benchmarks.

02

Demonstrates systematic prompt improvement through synthetic data feedback.

03

Enhances prompt robustness without external supervision.

Abstract

Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop framework for prompt learning that integrates synthetic data generation into the optimization process. SIPDO couples a synthetic data generator with a prompt optimizer, where the generator produces new examples that reveal current prompt weaknesses and the optimizer incrementally refines the prompt in response. This feedback-driven loop enables systematic improvement of prompt performance without assuming access to external supervision or new tasks. Experiments across question answering and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- Integrating synthetic data generation with prompt optimization into a dynamic closed-loop framework goes beyond traditional optimization methods that operate on static datasets. - Difficulty is monotonically increased through a difficulty parameter and curriculum learning, aligning with principles of human learning. - The ablation studies are well-designed and effectively validate the contributions of the core components.

Weaknesses

- Each optimization iteration requires multiple LLM invocations, resulting in significantly higher computational cost compared to conventional approaches. Although the three-expert verification mechanism improves data quality, it substantially increases latency. - Generation quality is directly constrained by the capabilities of the underlying LLM; for instance, GPT-4o-mini performs markedly worse than GPT-4o. The paper does not explore strategies to reduce reliance on powerful base models, limi

Reviewer 02Rating 6Confidence 3

Strengths

SIPDO introduces a closed-loop optimization framework that couples a synthetic data generator with a prompt optimizer through a dual-agent collaboration mechanism. The generator dynamically produces challenging samples targeting the current prompt’s weaknesses, while a progressive difficulty parameter ccc enables a curriculum learning strategy from simple to complex tasks. Ablation results demonstrate that this difficulty-gradient method improves average performance by 17.3%–24.3% compared to on

Weaknesses

1. Although Table 2 provides a comprehensive overview of prompt optimization baselines, the current comparisons mainly cover works from 2022–2024 and lack the inclusion of more recent 2025 methods. In particular, direct comparisons with the latest closed-loop or iterative prompt optimization approaches are missing. Most existing baselines used in this paper focus on heuristic or search-based prompt engineering rather than a fully integrated feedback loop. 2. While the related work section (pp. 2

Reviewer 03Rating 4Confidence 4

Strengths

**Originality** This work demonstrates originality in three folds: (1) this work integrates synthetic data generation into prompt optimization; (2) the authors introduce to construct synthetic examples with progressive difficulty design to dynamically guide the prompt refinement; (3) The authors provide a theoretical guarantee for the proposed framework to assure the prompt error bounds. **Quality & Clarity** The framework is well-structured with self-contained contents. Extensive experimen

Weaknesses

1. The authors claim no external supervision is used in this method, but the true data including question and answer fed into the "Data Generator" module actually provide the supervision signals for the loop. 2. No quantitative demonstration on the fluctuation of LLMs' performance when the input distribution changes as mentioned in L48? 3. How to define the difficulty level L mentioned in Line 172? No investigation on the influence of this number to the overall performance. 4. The "recommen

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Advanced Control Systems Optimization · Control Systems and Identification