Evaluating Plan Compliance in Autonomous Programming Agents
Shuyang Liu, Saman Dehghan, Jatin Ganhotra, Martin Hirzel, Reyhaneh Jabbarvand

TL;DR
This study systematically analyzes how well autonomous programming agents follow instructed plans, revealing that proper plan guidance improves performance, but poorly aligned plans can hinder it, emphasizing the need for adaptive reasoning training.
Contribution
It provides the first large-scale, systematic evaluation of plan compliance in programming agents, highlighting the impact of plan quality and the importance of adaptive reasoning training.
Findings
Providing standard plans improves issue resolution.
Periodic plan reminders reduce violations and increase success.
Poorly aligned or overly complex plans can degrade performance.
Abstract
Agents aspire to eliminate the need for task-specific prompt crafting through autonomous reason-act-observe loops. Still, they are commonly instructed to follow a task-specific plan for guidance, e.g., to resolve software issues following phases for navigation, reproduction, patch, and validation. Unfortunately, it is unknown to what extent agents actually follow such instructed plans. Without such an analysis, determining the extent agents comply with a given plan, it is impossible to assess whether a solution was reached through correct strategic reasoning or through other means, e.g., data contamination or overfitting to a benchmark. This paper presents the first extensive, systematic analysis of plan compliance in programming agents, examining 16,991 trajectories from SWE-agent across four LLMs on SWE-bench Verified and SWE-bench Pro under eight plan variations. Without an explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
