Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

Jingjie Ning; Xiaochuan Li; Ji Zeng; Hao Kang; Chenyan Xiong

arXiv:2605.05724·cs.MA·May 8, 2026

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

Jingjie Ning, Xiaochuan Li, Ji Zeng, Hao Kang, Chenyan Xiong

PDF

1 Repo

Abstract

We study auto research as a closed empirical loop driven by external measurement. Each submitted trial carries a hypothesis, an executable code edit, an evaluator-owned outcome, and feedback that shapes the next proposal. The output is not a generated paper or a single model checkpoint, but an auditable trajectory of proposals, code diffs, experiments, scores, and failure labels. We instantiate this loop with specialist agents that partition recipe surfaces and share measured lineage across trials. The central empirical finding is that lineage feedback lets agents turn evaluator outcomes, including crashes, budget overruns, size failures, and accuracy-gate misses, into later program-level recipe edits rather than one-shot suggestions. Across 1,197 headline-run trials plus 600 Parameter Golf control trials after one-time setup and launch, humans did not choose proposals, edit recipes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cxcscmu/Auto-Research-Recipes
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.