Toward Autonomous Long-Horizon Engineering for ML Research

Guoxin Chen; Jie Chen; Lei Chen; Jiale Zhao; Fanzhe Meng; Wayne Xin Zhao; Ruihua Song; Cheng Chen; Ji-Rong Wen; Kai Jia

arXiv:2604.13018·cs.CL·April 15, 2026

Toward Autonomous Long-Horizon Engineering for ML Research

Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia

PDF

1 Repo

TL;DR

AiScientist is a system that enhances autonomous long-horizon ML research by combining hierarchical orchestration with durable state management, significantly improving benchmark scores.

Contribution

The paper introduces AiScientist, a novel system that integrates structured orchestration and durable artifacts to address long-horizon ML research engineering challenges.

Findings

01

AiScientist improves PaperBench score by 10.54 points on average.

02

Achieves 81.82% on MLE-Bench Lite.

03

File-as-Bus protocol is crucial for performance, reducing scores when removed.

Abstract

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon engineering for ML research built on a simple principle: strong long-horizon performance requires both structured orchestration and durable state continuity. To this end, AiScientist combines hierarchical orchestration with a permission-scoped File-as-Bus workspace: a top-level Orchestrator maintains stage-level control through concise summaries and a workspace map, while specialized agents repeatedly re-ground on durable artifacts such as analyses, plans, code, and experimental evidence rather than relying primarily on conversational handoffs, yielding thin control over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aweai-team/AiScientist
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.