BUILD-AND-FIND: An Effort-Aware Protocol for Evaluating Agent-Managed Codebases
Jhen-Ke Lin

TL;DR
BUILD-AND-FIND is a protocol for evaluating how well downstream agents can recover intended design choices from generated repositories, emphasizing inspection effort, recovery accuracy, and artifact clarity.
Contribution
It introduces a new evaluation protocol that separates behavioral correctness from artifact interpretability in agent-managed codebases.
Findings
Recovery accuracy is near saturation in the high-prior task pack.
Lower effort correlates with artifacts that make intent easier to locate.
The protocol effectively measures inspection effort and recovery reliability.
Abstract
Most coding-agent benchmarks ask whether generated code behaves correctly. That remains essential, but repository-level engineering is increasingly agent-managed: one agent writes a repository, and later agents inspect, audit, or extend it as working context. In that setting, a generated repository is not only an answer to a task but also a communication artifact for future work. Even when strong agents nearly satisfy the visible behavioral objective, repositories can differ in how clearly they expose the intended behavior and design choices behind that behavior. We introduce BUILD-AND-FIND, a protocol for evaluating whether downstream agents can recover those intended choices from generated repositories, and how much inspection that recovery requires. For each task, a builder sees a hidden repository specification and creates a codebase; a finder sees only the codebase and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
