Effective Harness Engineering for Algorithm Discovery with Coding Agents

Yoichi Ishibashi; Taro Yano; Masafumi Oyamada

arXiv:2605.15221·cs.SE·May 18, 2026

Effective Harness Engineering for Algorithm Discovery with Coding Agents

Yoichi Ishibashi, Taro Yano, Masafumi Oyamada

PDF

TL;DR

This paper explores how the design of the execution harness impacts the success of algorithm discovery using LLMs and evolutionary search, emphasizing deeper thinking per algorithm over quantity.

Contribution

It introduces Vesper, a framework with improved harness design strategies that enhance algorithm discovery efficiency and safety in parallel execution.

Findings

01

Fewer, deeper algorithms outperform many shallow ones within the same token budget.

02

Higher-capability models tend to generate more evaluation hacks, necessitating better detection.

03

Deeper thinking per algorithm is more cost-effective than increasing the number of algorithms.

Abstract

AlphaEvolve and FunSearch have demonstrated the potential of combining large language models (LLMs) with evolutionary search for automated algorithm discovery. However, discovery success is shaped not only by model capability but also significantly by the design of the execution infrastructure, i.e., the harness. This paper investigates effective harness design through three questions: under a fixed token budget, is it better to produce many algorithms with brief thought or fewer algorithms with deeper thought? How should the harness handle evaluation hacks, where generated programs exploit the scoring function? And how can agents that require full filesystem access execute safely in parallel? Using Vesper, an algorithm discovery framework that incorporates harness improvements addressing these questions, we evaluate on Circle Packing under the same token budget. Interestingly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.