Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Xinyu Zhu; Yuzhu Cai; Zexi Liu; Bingyang Zheng; Cheng Wang; Rui Ye; Yuzhi Zhang; Linfeng Zhang; Weinan E; Siheng Chen; Yanfeng Wang

arXiv:2601.10402·cs.AI·March 26, 2026

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Yuzhi Zhang, Linfeng Zhang, Weinan E, Siheng Chen, Yanfeng Wang

PDF

Open Access

TL;DR

This paper introduces ML-Master 2.0, an autonomous agent designed for ultra-long-horizon machine learning engineering, employing hierarchical cognitive caching to sustain strategic coherence over days or weeks, surpassing previous short-horizon models.

Contribution

It proposes Hierarchical Cognitive Caching, a novel multi-tiered architecture that enables long-term experience accumulation and strategic planning in autonomous AI agents.

Findings

01

Achieved a 56.44% medal rate on OpenAI's MLE-Bench within 24 hours.

02

Demonstrated effective long-term knowledge consolidation beyond static context limits.

03

Showed that ultra-long-horizon autonomy enables scalable scientific exploration.

Abstract

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Big Data and Digital Economy