AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering
Di Zhang

TL;DR
AgentDevel introduces a release engineering approach for LLM agents, externalizing improvement into a regression-aware pipeline that ensures stable, auditable, and non-regressive updates, contrasting with traditional self-improvement methods.
Contribution
It presents AgentDevel, a novel release engineering pipeline for LLM agents that emphasizes non-regression, external diagnostics, and stable iterative improvements.
Findings
Yields stable improvements with fewer regressions.
Produces reproducible, auditable artifacts.
Maintains a single canonical version line.
Abstract
Recent progress in large language model (LLM) agents has largely focused on embedding self-improvement mechanisms inside the agent or searching over many concurrent variants. While these approaches can raise aggregate scores, they often yield unstable and hard-to-audit improvement trajectories, making it difficult to guarantee non-regression or to reason about failures across versions. We reframe agent improvement as \textbf{release engineering}: agents are treated as shippable artifacts, and improvement is externalized into a regression-aware release pipeline. We introduce \textbf{AgentDevel}, a release engineering pipeline that iteratively runs the current agent, produces implementation-blind, symptom-level quality signals from execution traces, synthesizes a single release candidate (RC) via executable diagnosis, and promotes it under flip-centered gating. AgentDevel features three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Software Engineering Techniques and Practices
