Evaluating Plan Compliance in Autonomous Programming Agents

Shuyang Liu; Saman Dehghan; Jatin Ganhotra; Martin Hirzel; Reyhaneh Jabbarvand

arXiv:2604.12147·cs.SE·April 29, 2026

Evaluating Plan Compliance in Autonomous Programming Agents

Shuyang Liu, Saman Dehghan, Jatin Ganhotra, Martin Hirzel, Reyhaneh Jabbarvand

PDF

TL;DR

This study systematically analyzes how well autonomous programming agents follow instructed plans, revealing that proper plan guidance improves performance, but poorly aligned plans can hinder it, emphasizing the need for adaptive reasoning training.

Contribution

It provides the first large-scale, systematic evaluation of plan compliance in programming agents, highlighting the impact of plan quality and the importance of adaptive reasoning training.

Findings

01

Providing standard plans improves issue resolution.

02

Periodic plan reminders reduce violations and increase success.

03

Poorly aligned or overly complex plans can degrade performance.

Abstract

Agents aspire to eliminate the need for task-specific prompt crafting through autonomous reason-act-observe loops. Still, they are commonly instructed to follow a task-specific plan for guidance, e.g., to resolve software issues following phases for navigation, reproduction, patch, and validation. Unfortunately, it is unknown to what extent agents actually follow such instructed plans. Without such an analysis, determining the extent agents comply with a given plan, it is impossible to assess whether a solution was reached through correct strategic reasoning or through other means, e.g., data contamination or overfitting to a benchmark. This paper presents the first extensive, systematic analysis of plan compliance in programming agents, examining 16,991 trajectories from SWE-agent across four LLMs on SWE-bench Verified and SWE-bench Pro under eight plan variations. Without an explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.