TL;DR
AceGRPO introduces an adaptive curriculum method with evolving data buffers and dynamic task prioritization, significantly improving autonomous machine learning engineering performance.
Contribution
The paper proposes AceGRPO, a novel RL-based framework with adaptive sampling and data reuse, enhancing efficiency and effectiveness in long-horizon autonomous MLE tasks.
Findings
Achieved 100% valid submission rate on MLE-Bench-Lite.
Outperformed larger open-source baselines like DeepSeek-V3.2.
Approached performance of proprietary frontier models.
Abstract
Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-based agents for MLE suffer from behavioral stagnation due to frozen parameters. Although Reinforcement Learning (RL) offers a remedy, applying it to MLE is hindered by prohibitive execution latency and inefficient data selection. Recognizing these challenges, we propose AceGRPO with two core components: (1) Evolving Data Buffer that continuously repurposes execution traces into reusable training tasks, and (2) Adaptive Sampling guided by a Learnability Potential function, which dynamically prioritizes tasks at the agent's learning frontier to maximize learning efficiency. Leveraging AceGRPO, our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
