What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Alexis Audran-Reiss; Jordi Armengol-Estap\'e; Karen Hambardzumyan; Amar Budhiraja; Martin Josifoski; Edan Toledo; Rishi Hazra; Despoina Magka; Michael Shvartsman; Parth Pathak; Justine T Kao; Lucia Cipolina-Kun; Bhavul Gauri; Jean-Christophe Gagnon-Audet; Emanuel Tewolde; Jenny Zhang; Taco Cohen; Yossi Adi; Tatiana Shavrina; Yoram Bachrach

arXiv:2511.15593·cs.AI·December 10, 2025

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Alexis Audran-Reiss, Jordi Armengol-Estap\'e, Karen Hambardzumyan, Amar Budhiraja, Martin Josifoski, Edan Toledo, Rishi Hazra, Despoina Magka, Michael Shvartsman, Parth Pathak, Justine T Kao, Lucia Cipolina-Kun, Bhavul Gauri, Jean-Christophe Gagnon-Audet, Emanuel Tewolde

PDF

Open Access

TL;DR

This paper investigates how ideation diversity influences AI research agent performance, demonstrating that greater diversity leads to improved results across various models and metrics.

Contribution

It provides the first systematic analysis linking ideation diversity to agent success and experimentally confirms its positive impact on performance.

Findings

01

Higher ideation diversity correlates with better agent performance.

02

Modifying diversity levels can directly influence agent success.

03

Results are consistent across multiple evaluation metrics.

Abstract

AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, and the key factors driving the success or failure of agent trajectories are not fully understood. We examine the role that ideation diversity plays in agent performance. First, we analyse agent trajectories on MLE-bench, a well-known benchmark to evaluate AI research agents, across different models and agent scaffolds. Our analysis reveals that different models and agent scaffolds yield varying degrees of ideation diversity, and that higher-performing agents tend to have increased ideation diversity. Further, we run a controlled experiment where we modify the degree of ideation diversity, demonstrating that higher ideation diversity results in stronger performance. Finally, we strengthen our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing