TL;DR
This study compares classical hyperparameter optimization algorithms with LLM-based methods using autoresearch, finding classical methods generally outperform LLMs unless hybrid approaches like Centaur are used, which leverage strengths of both.
Contribution
The paper introduces Centaur, a hybrid hyperparameter optimization method combining CMA-ES and LLMs, achieving superior results over pure classical or LLM approaches.
Findings
Classical methods outperform LLMs in fixed search spaces.
Allowing LLMs to edit code narrows but does not eliminate performance gaps.
Centaur, a hybrid approach, outperforms all tested methods.
Abstract
The autoresearch repository enables an LLM agent to optimize hyperparameters by editing training code directly. We use it as a testbed to compare classical HPO algorithms against LLM-based methods on tuning the hyperparameters of a small language model under a fixed compute budget. When defining a fixed search space over autoresearch, classical methods such as CMA-ES and TPE consistently outperform LLM-based agents, where avoiding out-of-memory failures matters more than search diversity. Allowing the LLM to directly edit source code narrows the gap to the classical methods but does not close it, even with frontier models available at the time of writing such as Claude Opus 4.6 and Gemini 3.1 Pro Preview. We observe that LLMs struggle to track optimization state across trials. In contrast, classical methods lack the domain knowledge of LLMs. To combine the strengths of both, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
