Understanding the Challenges in Iterative Generative Optimization with LLMs

Allen Nie; Xavier Daull; Zhiyi Kuang; Abhinav Akkiraju; Anish Chaudhuri; Max Piasevoli; Ryan Rong; YuCheng Yuan; Prerit Choudhary; Shannon Xiao; Rasool Fakoor; Adith Swaminathan; Ching-An Cheng

arXiv:2603.23994·cs.LG·March 26, 2026

Understanding the Challenges in Iterative Generative Optimization with LLMs

Allen Nie, Xavier Daull, Zhiyi Kuang, Abhinav Akkiraju, Anish Chaudhuri, Max Piasevoli, Ryan Rong, YuCheng Yuan, Prerit Choudhary, Shannon Xiao, Rasool Fakoor, Adith Swaminathan, Ching-An Cheng

PDF

Open Access

TL;DR

This paper investigates the practical challenges of iterative generative optimization with large language models, highlighting how design choices impact success and providing guidance for better setup across applications.

Contribution

It identifies key factors influencing generative optimization success and offers practical recommendations to improve its robustness and applicability across domains.

Findings

01

Starting artifacts influence solution reachability.

02

Truncated traces can still enhance performance.

03

Larger minibatches do not always improve generalization.

Abstract

Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite active research, only 9% of surveyed agents used any automated optimization. We argue that this brittleness arises because, to set up a learning loop, an engineer must make ``hidden'' design choices: What can the optimizer edit and what is the "right" learning evidence to provide at each update? We investigate three factors that affect most applications: the starting artifact, the credit horizon for execution traces, and batching trials and errors into learning evidence. Through case studies in MLAgentBench, Atari, and BigBench Extra Hard, we find that these design decisions can determine whether generative optimization succeeds, yet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Reinforcement Learning in Robotics · Language and cultural evolution