ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu; Jingyi Chai; Xinyu Zhu; Shuo Tang; Rui Ye; Bo Zhang; Lei Bai; Siheng Chen

arXiv:2505.23723·cs.CL·May 4, 2026

ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

Zexi Liu, Jingyi Chai, Xinyu Zhu, Shuo Tang, Rui Ye, Bo Zhang, Lei Bai, Siheng Chen

PDF

TL;DR

This paper introduces ML-Agent, a reinforcement learning framework for autonomous ML engineering using a 7B-sized LLM, achieving competitive results with lower computational costs across multiple tasks.

Contribution

It proposes a novel learning-based agentic ML paradigm with exploration-enriched fine-tuning, step-wise RL, and a unified reward module, enabling efficient autonomous ML development.

Findings

01

ML-Agent trained on 9 tasks performs comparably to larger proprietary LLM-based agents.

02

The framework reduces computational costs while maintaining strong cross-task generalization.

03

The approach demonstrates effective reinforcement learning for autonomous ML engineering.

Abstract

The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.