TL;DR
AiScientist is a system that enhances autonomous long-horizon ML research by combining hierarchical orchestration with durable state management, significantly improving benchmark scores.
Contribution
The paper introduces AiScientist, a novel system that integrates structured orchestration and durable artifacts to address long-horizon ML research engineering challenges.
Findings
AiScientist improves PaperBench score by 10.54 points on average.
Achieves 81.82% on MLE-Bench Lite.
File-as-Bus protocol is crucial for performance, reducing scores when removed.
Abstract
Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon engineering for ML research built on a simple principle: strong long-horizon performance requires both structured orchestration and durable state continuity. To this end, AiScientist combines hierarchical orchestration with a permission-scoped File-as-Bus workspace: a top-level Orchestrator maintains stage-level control through concise summaries and a workspace map, while specialized agents repeatedly re-ground on durable artifacts such as analyses, plans, code, and experimental evidence rather than relying primarily on conversational handoffs, yielding thin control over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
