Sequential Large Language Model-Based Hyper-parameter Optimization
Kanan Mahammadli, Seyda Ertekin

TL;DR
This paper presents SLLMBO, a novel framework using large language models for hyperparameter optimization, incorporating dynamic search space adaptation and a new LLM-TPE sampler, outperforming traditional methods on multiple tasks.
Contribution
Introduces SLLMBO, the first framework benchmarking diverse LLMs for HPO, with a novel LLM-TPE sampler balancing exploration and exploitation effectively.
Findings
LLM-TPE outperforms fully LLM-based methods on 9 of 14 tasks.
SLLMBO achieves more robust optimization than traditional Bayesian methods.
The framework reduces API costs and mitigates premature early stopping.
Abstract
This study introduces SLLMBO, an innovative framework leveraging large language models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter space exploitation, and a novel LLM-tree-structured parzen estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional bayesian optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-Turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-Flash, extending prior work and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Cosine Annealing · Transformer · Byte Pair Encoding · Layer Normalization
