Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments
Xiaoyi Li

TL;DR
This study demonstrates that LLM agents primarily perform genuine architecture search rather than hyperparameter tuning, with architectural choices explaining most of the performance variance in ML experiment design.
Contribution
The paper provides the first large-scale empirical analysis of LLM-guided combinatorial ML experiment design, highlighting the dominance of architecture discovery over hyperparameter tuning.
Findings
Architectural choices explain 94% of performance variance.
LLM agents discover novel effective architectures like V-JEPA2 with Zipformer.
Power-law convergence indicates broad exploration costs, not inefficiency.
Abstract
When LLM agents autonomously design ML experiments, do they perform genuine architecture search -- or do they default to hyperparameter tuning within a narrow region of the design space? We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5 Pro) across a combinatorial configuration space of 108,000 discrete cells for dashcam collision detection over 27 days. Through ANOVA decomposition, we find that \textbf{architectural choices explain 94\% of performance variance} (, ), while hyperparameter variation within a fixed architecture explains only 6\%. Cross-task validation on a second collision dataset confirms this finding (75\% architecture-explained variance) with a \emph{different} winning backbone, confirming genuine architecture discovery. The agents' key contribution is discovering that V-JEPA\,2 video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Reinforcement Learning in Robotics · Robot Manipulation and Learning
