AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

Rajesh Kumar; Waqar Ali; Junaid Ahmed; Najma Imtiaz Ali; Shaban Usman

arXiv:2604.13120·cs.SE·April 16, 2026

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

Rajesh Kumar, Waqar Ali, Junaid Ahmed, Najma Imtiaz Ali, Shaban Usman

PDF

1 Repo

TL;DR

AgentForge introduces an execution-grounded multi-agent framework for autonomous software engineering, emphasizing sandboxed code verification to improve correctness and performance.

Contribution

This work formalizes execution-grounded verification as a core principle and demonstrates its effectiveness in a multi-agent LLM system for software development.

Findings

01

Achieves 40.0% resolution on SWE-BENCH Lite, surpassing single-agent baselines by 26-28 points.

02

Execution feedback and role decomposition independently improve system performance.

03

Open-source implementation available at https://github.com/raja21068/AutoCodeAI.

Abstract

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code change must survive sandboxed execution before propagation. We instantiate this principle in AGENTFORGE, a multi-agent framework where Planner, Coder, Tester, Debugger, and Critic agents coordinate through shared memory and a mandatory Docker sandbox. We formalize software engineering with LLMs as an iterative decision process over repository states, where execution feedback provides a stronger supervision signal than next-token likelihood. AGENTFORGE achieves 40.0\% resolution on SWE-BENCH Lite, outperforming single-agent baselines by 26--28 points. Ablations confirm that execution feedback and role decomposition each independently drive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raja21068/AutoCodeAI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.