Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Jiahang Lin; Shichun Liu; Chengjun Pan; Lizhi Lin; Shihan Dou; Zhiheng Xi; Xuanjing Huang; Hang Yan; Zhenhua Han; Tao Gui; Yu-Gang Jiang

arXiv:2604.25850·cs.CL·May 19, 2026

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Jiahang Lin, Shichun Liu, Chengjun Pan, Lizhi Lin, Shihan Dou, Zhiheng Xi, Xuanjing Huang, Hang Yan, Zhenhua Han, Tao Gui, Yu-Gang Jiang

PDF

2 Repos

TL;DR

This paper introduces Agentic Harness Engineering (AHE), an automated, observability-driven method for evolving coding-agent harnesses that improve performance and transferability across models.

Contribution

AHE provides a novel closed-loop framework with three observability pillars enabling autonomous harness evolution without trial-and-error.

Findings

01

AHE iterations improve pass@1 from 69.7% to 77.0% on Terminal-Bench 2.

02

Evolved harness components transfer effectively without re-evolution.

03

Cross-family gains of +5.1 to +10.1 percentage points across model types.

Abstract

Harnesses are now central to coding-agent performance, mediating how models interact with tools and execution environments. Yet harness engineering remains a manual craft, because automating it faces a heterogeneous action space across editable components, voluminous trajectories that bury actionable signal, and edits whose effect is hard to attribute. We introduce Agentic Harness Engineering (AHE), a closed loop that addresses these challenges through three matched observability pillars: (1) component observability gives every editable harness component a file-level representation so the action space is explicit and revertible; (2) experience observability distills millions of raw trajectory tokens into a layered, drill-down evidence corpus that an evolving agent can actually consume; and (3) decision observability pairs every edit with a self-declared prediction, later verified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.