AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
Hailin Zhong, Shengxin Zhu

TL;DR
This paper introduces a runtime substrate called the AI Harness Engineering framework, which mediates foundation-model software agents' interactions with projects to improve reliability and verifiability.
Contribution
It formalizes the AI Harness Engineering system, operationalizes it through a four-level ladder, and proposes an evaluation protocol to assess agent performance systematically.
Findings
Higher harness levels produce more detailed and verifiable agent outputs.
The framework enables systematic evaluation of agent episodes with evidence and attribution.
Reframes autonomous software engineering as a system-level challenge rather than model capability alone.
Abstract
Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a foundation-model agent observes a project, acts on it, receives feedback, and establishes that a change is complete. We formalize this substrate as an AI Harness Engineering and identify eleven component responsibilities: task specification, context selection, tool access, project memory, task state, observability, failure attribution, verification, permissions, entropy auditing, and intervention recording. We operationalize the harness through a four-level ladder (H0-H3) that progressively exposes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
