AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering

Yuzhu Cai; Zexi Liu; Xinyu Zhu; Cheng Wang; Yanfeng Wang; Siheng Chen

arXiv:2602.07906·cs.LG·May 8, 2026

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering

Yuzhu Cai, Zexi Liu, Xinyu Zhu, Cheng Wang, Yanfeng Wang, Siheng Chen

PDF

1 Repo

TL;DR

AceGRPO introduces an adaptive curriculum method with evolving data buffers and dynamic task prioritization, significantly improving autonomous machine learning engineering performance.

Contribution

The paper proposes AceGRPO, a novel RL-based framework with adaptive sampling and data reuse, enhancing efficiency and effectiveness in long-horizon autonomous MLE tasks.

Findings

01

Achieved 100% valid submission rate on MLE-Bench-Lite.

02

Outperformed larger open-source baselines like DeepSeek-V3.2.

03

Approached performance of proprietary frontier models.

Abstract

Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-based agents for MLE suffer from behavioral stagnation due to frozen parameters. Although Reinforcement Learning (RL) offers a remedy, applying it to MLE is hindered by prohibitive execution latency and inefficient data selection. Recognizing these challenges, we propose AceGRPO with two core components: (1) Evolving Data Buffer that continuously repurposes execution traces into reusable training tasks, and (2) Adaptive Sampling guided by a Learnability Potential function, which dynamically prioritizes tasks at the agent's learning frontier to maximize learning efficiency. Leveraging AceGRPO, our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuzhu-cai/AceGRPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.