MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

Rushi Qiang; Yuchen Zhuang; Yinghao Li; Dingu Sagar V K; Rongzhi Zhang; Changhao Li; Ian Shu-Hei Wong; Sherry Yang; Percy Liang; Chao Zhang; Bo Dai

arXiv:2505.07782·cs.LG·May 13, 2025

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

PDF

Open Access 1 Repo

TL;DR

MLE-Dojo is an interactive, open-source framework that enables systematic reinforcement learning and evaluation of large language model agents in realistic machine learning engineering tasks, promoting iterative improvement and reproducibility.

Contribution

It introduces a versatile, real-world inspired environment for training and assessing LLM agents in complex MLE workflows, supporting both supervised and reinforcement learning methods.

Findings

01

Current LLMs show partial improvements but struggle with long-horizon tasks.

02

The framework supports diverse data sources and tools for comprehensive evaluation.

03

Open-sourcing encourages community-driven advancements in LLM-based MLE agents.

Abstract

We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experiment, debug, and refine solutions through structured feedback loops. Built upon 200+ real-world Kaggle challenges, MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios such as data processing, architecture search, hyperparameter tuning, and code debugging. Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning, facilitating iterative experimentation, realistic data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MLE-Dojo/MLE-Dojo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)