Synthetic Sandbox for Training Machine Learning Engineering Agents

Yuhang Zhou; Lizhu Zhang; Yifan Wu; Jiayi Liu; Xiangjun Fan; Zhuokai Zhao; Hong Yan

arXiv:2604.04872·cs.CL·April 7, 2026

Synthetic Sandbox for Training Machine Learning Engineering Agents

Yuhang Zhou, Lizhu Zhang, Yifan Wu, Jiayi Liu, Xiangjun Fan, Zhuokai Zhao, Hong Yan

PDF

TL;DR

SandMLE introduces a synthetic sandbox environment for efficient, large-scale on-policy RL training of machine learning engineering agents, significantly reducing computational costs while maintaining problem complexity.

Contribution

The paper presents SandMLE, a framework that creates diverse, verifiable synthetic MLE environments from few seed tasks, enabling scalable on-policy RL in MLE.

Findings

01

SandMLE reduces execution time by over 13 times.

02

It achieves significant performance gains over supervised fine-tuning baselines.

03

The trained policies generalize well to unseen agentic scaffolds.

Abstract

As large language model agents advance beyond software engineering (SWE) tasks toward machine learning engineering (MLE), verifying agent behavior becomes orders of magnitude more expensive: while SWE tasks can be verified via fast-executing unit tests, MLE verification requires running full ML pipelines -- data preprocessing, model training, and metric evaluation -- on large datasets at each rollout step, rendering trajectory-wise on-policy reinforcement learning (RL) prohibitively slow. Existing approaches retreat to supervised fine-tuning (SFT) or offline proxy rewards, sacrificing the exploration and generalization benefits of on-policy RL. We observe that sandbox data size is the primary source of this bottleneck. Based on this insight, we introduce SandMLE, a multi-agent framework that generates diverse, verifiable synthetic MLE environments from a small number of seed tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.