ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges

Roshan Kenia; Xiaoman Zhang; Pranav Rajpurkar

arXiv:2512.17838·cs.CV·December 22, 2025

ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges

Roshan Kenia, Xiaoman Zhang, Pranav Rajpurkar

PDF

Open Access

TL;DR

ReX-MLE is a comprehensive benchmark designed to evaluate autonomous AI agents on complex medical imaging challenges, revealing significant performance gaps and highlighting areas for domain-specific improvement.

Contribution

It introduces a new benchmark for autonomous agents in medical imaging, assessing end-to-end workflows across diverse challenges and exposing current limitations of state-of-the-art agents.

Findings

01

Most agents perform in the 0th percentile compared to humans.

02

Performance gaps are due to domain-knowledge and engineering limitations.

03

ReX-MLE highlights bottlenecks and guides future development.

Abstract

Autonomous coding agents built on large language models (LLMs) can now solve many general software and machine learning tasks, but they remain ineffective on complex, domain-specific scientific problems. Medical imaging is a particularly demanding domain, requiring long training cycles, high-dimensional data handling, and specialized preprocessing and validation pipelines, capabilities not fully measured in existing agent benchmarks. To address this gap, we introduce ReX-MLE, a benchmark of 20 challenges derived from high-impact medical imaging competitions spanning diverse modalities and task types. Unlike prior ML-agent benchmarks, ReX-MLE evaluates full end-to-end workflows, requiring agents to independently manage data preprocessing, model training, and submission under realistic compute and time constraints. Evaluating state-of-the-art agents (AIDE, ML-Master, R&D-Agent) with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Multimodal Machine Learning Applications