Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Yifei Zhang; Xu Yang; Xiao Yang; Bowen Xian; Qizheng Li; Shikai Fang; Jingyuan Li; Jian Wang; Mingrui Xu; Weiqing Liu; Jiang Bian

arXiv:2603.01692·cs.LG·April 14, 2026

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces Gome, a gradient-based optimization approach for LLM agents in machine learning engineering, demonstrating superior performance over traditional tree search methods as models improve.

Contribution

Gome operationalizes gradient-based optimization for LLM agents, enabling more efficient reasoning beyond tree search, with state-of-the-art results on MLE-Bench.

Findings

01

Gome achieves 35.1% any-medal rate on MLE-Bench within 12 hours.

02

Gradient-based optimization outperforms tree search as model reasoning improves.

03

Scaling experiments show a crossover point where gradient methods become superior.

Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce Gome, an MLE agent that operationalizes gradient-based optimization. Gome maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, Gome achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/RD-Agent
github

Datasets

amstrongzyf/Gome-GPT5-Traces
dataset· 36 dl
36 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.