The Last Harness You'll Ever Build

Haebin Seong; Li Yin; Haoran Zhang; Zhan Shi

arXiv:2604.21003·cs.AI·May 5, 2026

The Last Harness You'll Ever Build

Haebin Seong, Li Yin, Haoran Zhang, Zhan Shi

PDF

TL;DR

This paper introduces a two-level automated framework that evolves and meta-optimizes AI task harnesses, reducing manual engineering and enabling rapid adaptation to new domains.

Contribution

It presents a novel hierarchical framework that automates harness engineering and meta-optimizes the process across diverse tasks, eliminating manual design efforts.

Findings

01

The framework automates harness design for complex AI tasks.

02

Meta-evolution learns a blueprint for rapid adaptation.

03

It formalizes the process with algorithms inspired by meta-learning.

Abstract

AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness $H$ for a single task: a Worker Agent $W_{H}$ executes the task, an Evaluator Agent $V$ adversarially diagnoses failures and scores performance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.