AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Hailin Zhong; Shengxin Zhu

arXiv:2605.13357·cs.SE·May 14, 2026

AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Hailin Zhong, Shengxin Zhu

PDF

TL;DR

This paper introduces a runtime substrate called the AI Harness Engineering framework, which mediates foundation-model software agents' interactions with projects to improve reliability and verifiability.

Contribution

It formalizes the AI Harness Engineering system, operationalizes it through a four-level ladder, and proposes an evaluation protocol to assess agent performance systematically.

Findings

01

Higher harness levels produce more detailed and verifiable agent outputs.

02

The framework enables systematic evaluation of agent episodes with evidence and attribution.

03

Reframes autonomous software engineering as a system-level challenge rather than model capability alone.

Abstract

Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a foundation-model agent observes a project, acts on it, receives feedback, and establishes that a change is complete. We formalize this substrate as an AI Harness Engineering and identify eleven component responsibilities: task specification, context selection, tool access, project memory, task state, observability, failure attribution, verification, permissions, entropy auditing, and intervention recording. We operationalize the harness through a four-level ladder (H0-H3) that progressively exposes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.