MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang, Yuchen Zhuang, Yinghao Li, Dingu Sagar V K, Rongzhi Zhang, Changhao Li, Ian Shu-Hei Wong, Sherry Yang, Percy Liang, Chao Zhang, Bo Dai

TL;DR
MLE-Dojo is an interactive, open-source framework that enables systematic reinforcement learning and evaluation of large language model agents in realistic machine learning engineering tasks, promoting iterative improvement and reproducibility.
Contribution
It introduces a versatile, real-world inspired environment for training and assessing LLM agents in complex MLE workflows, supporting both supervised and reinforcement learning methods.
Findings
Current LLMs show partial improvements but struggle with long-horizon tasks.
The framework supports diverse data sources and tools for comprehensive evaluation.
Open-sourcing encourages community-driven advancements in LLM-based MLE agents.
Abstract
We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experiment, debug, and refine solutions through structured feedback loops. Built upon 200+ real-world Kaggle challenges, MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios such as data processing, architecture search, hyperparameter tuning, and code debugging. Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning, facilitating iterative experimentation, realistic data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Materials Science · Explainable Artificial Intelligence (XAI)
