Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering
Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Yuzhi Zhang, Linfeng Zhang, Weinan E, Siheng Chen, Yanfeng Wang

TL;DR
This paper introduces ML-Master 2.0, an autonomous agent designed for ultra-long-horizon machine learning engineering, employing hierarchical cognitive caching to sustain strategic coherence over days or weeks, surpassing previous short-horizon models.
Contribution
It proposes Hierarchical Cognitive Caching, a novel multi-tiered architecture that enables long-term experience accumulation and strategic planning in autonomous AI agents.
Findings
Achieved a 56.44% medal rate on OpenAI's MLE-Bench within 24 hours.
Demonstrated effective long-term knowledge consolidation beyond static context limits.
Showed that ultra-long-horizon autonomy enables scalable scientific exploration.
Abstract
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Big Data and Digital Economy
