Towards Execution-Grounded Automated AI Research

Chenglei Si; Zitong Yang; Yejin Choi; Emmanuel Cand\`es; Diyi Yang; Tatsunori Hashimoto

arXiv:2601.14525·cs.CL·January 22, 2026

Towards Execution-Grounded Automated AI Research

Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Cand\`es, Diyi Yang, Tatsunori Hashimoto

PDF

Open Access

TL;DR

This paper explores the feasibility of automated execution in AI research, demonstrating that execution-guided methods can effectively improve research ideas and optimize training processes, with analysis of their limitations and potential.

Contribution

It introduces an automated executor for implementing AI research ideas, evaluates execution-guided search and reinforcement learning, and analyzes their effectiveness and limitations.

Findings

01

Execution-guided evolutionary search outperforms baselines in efficiency and results.

02

Automated executor successfully implements a large fraction of ideas from frontier LLMs.

03

Reinforcement learning improves average reward but suffers from mode collapse.

Abstract

Automated AI research holds great potential to accelerate scientific discovery. However, current LLMs often generate plausible-looking but ineffective ideas. Execution grounding may help, but it is unclear whether automated execution is feasible and whether LLMs can learn from the execution feedback. To investigate these, we first build an automated executor to implement ideas and launch large-scale parallel GPU experiments to verify their effectiveness. We then convert two realistic research problems - LLM pre-training and post-training - into execution environments and demonstrate that our automated executor can implement a large fraction of the ideas sampled from frontier LLMs. We analyze two methods to learn from the execution feedback: evolutionary search and reinforcement learning. Execution-guided evolutionary search is sample-efficient: it finds a method that significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Teaching and Learning Programming