The Last Harness You'll Ever Build
Haebin Seong, Li Yin, Haoran Zhang, Zhan Shi

TL;DR
This paper introduces a two-level automated framework that evolves and meta-optimizes AI task harnesses, reducing manual engineering and enabling rapid adaptation to new domains.
Contribution
It presents a novel hierarchical framework that automates harness engineering and meta-optimizes the process across diverse tasks, eliminating manual design efforts.
Findings
The framework automates harness design for complex AI tasks.
Meta-evolution learns a blueprint for rapid adaptation.
It formalizes the process with algorithms inspired by meta-learning.
Abstract
AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness for a single task: a Worker Agent executes the task, an Evaluator Agent adversarially diagnoses failures and scores performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
