ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges
Roshan Kenia, Xiaoman Zhang, Pranav Rajpurkar

TL;DR
ReX-MLE is a comprehensive benchmark designed to evaluate autonomous AI agents on complex medical imaging challenges, revealing significant performance gaps and highlighting areas for domain-specific improvement.
Contribution
It introduces a new benchmark for autonomous agents in medical imaging, assessing end-to-end workflows across diverse challenges and exposing current limitations of state-of-the-art agents.
Findings
Most agents perform in the 0th percentile compared to humans.
Performance gaps are due to domain-knowledge and engineering limitations.
ReX-MLE highlights bottlenecks and guides future development.
Abstract
Autonomous coding agents built on large language models (LLMs) can now solve many general software and machine learning tasks, but they remain ineffective on complex, domain-specific scientific problems. Medical imaging is a particularly demanding domain, requiring long training cycles, high-dimensional data handling, and specialized preprocessing and validation pipelines, capabilities not fully measured in existing agent benchmarks. To address this gap, we introduce ReX-MLE, a benchmark of 20 challenges derived from high-impact medical imaging competitions spanning diverse modalities and task types. Unlike prior ML-agent benchmarks, ReX-MLE evaluates full end-to-end workflows, requiring agents to independently manage data preprocessing, model training, and submission under realistic compute and time constraints. Evaluating state-of-the-art agents (AIDE, ML-Master, R&D-Agent) with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Multimodal Machine Learning Applications
