Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

Xing Zhang; Guanghui Wang; Yanwei Cui; Wei Qiu; Ziyuan Li; Bing Zhu; Peiyang He

arXiv:2604.14585·cs.AI·April 17, 2026

Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He

PDF

TL;DR

This paper investigates the effectiveness of prompt optimization in compound AI systems, revealing it often resembles a coin flip and providing diagnostic tools to predict when it will be beneficial.

Contribution

It introduces a diagnostic framework with an ANOVA pre-test and a headroom test to determine when prompt optimization is likely to improve AI performance.

Findings

01

Optimization helps only with tasks having exploitable output structure

02

Interaction effects between prompts are statistically insignificant

03

The diagnostic tools can predict optimization success with high accuracy

Abstract

Prompt optimization in compound AI systems is statistically indistinguishable from a coin flip: across 72 optimization runs on Claude Haiku (6 methods $\times$ 4 tasks $\times$ 3 repeats), 49% score below zero-shot; on Amazon Nova Lite, the failure rate is even higher. Yet on one task, all six methods improve over zero-shot by up to $+ 6.8$ points. What distinguishes success from failure? We investigate with 18,000 grid evaluations and 144 optimization runs, testing two assumptions behind end-to-end optimization tools like TextGrad and DSPy: (A) individual prompts are worth optimizing, and (B) agent prompts interact, requiring joint optimization. Interaction effects are never significant ( $p > 0.52$ , all $F < 1.0$ ), and optimization helps only when the task has exploitable output structure -- a format the model can produce but does not default to. We provide a two-stage diagnostic: an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.