Harnesses for Inference-Time Alignment over Execution Trajectories
Boyuan Wang, Bochao Li, Minghan Wang, Yuxin Tao, Fang Kong

TL;DR
This paper analyzes harness engineering for LLMs at inference time, focusing on how task decomposition and guided execution influence performance, revealing limitations and proposing partial harness strategies for better success rates.
Contribution
It introduces a trajectory alignment perspective to quantify harness design effects, identifying failure modes and demonstrating the effectiveness of partial harnesses.
Findings
Over-decomposition and over-pruning can reduce success.
Guided execution reshapes local action distributions.
Partial harnesses can outperform fully structured workflows.
Abstract
Harness engineering has emerged as an important inference-time technique for large language model (LLM) agents, aiming to improve long-term performance through task decomposition and guided execution. However, more elaborate harnesses are not uniformly better: increasing decomposition or guidance can sometimes improve execution, but can also reduce final task success. We study harness design through the lens of inference-time trajectory alignment. This perspective separates harness into two mechanisms: task decomposition, which structures a task into sub-goals, and guided execution, which reshapes local action distributions during execution. This decomposition allows us to quantify how workflow granularity, retry budgets, and guidance-induced action reweighting shape the performance limits of harness design. It further reveals concrete failure modes, including over-decomposition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
