Sequential Large Language Model-Based Hyper-parameter Optimization

Kanan Mahammadli; Seyda Ertekin

arXiv:2410.20302·cs.LG·January 6, 2025·2 cites

Sequential Large Language Model-Based Hyper-parameter Optimization

Kanan Mahammadli, Seyda Ertekin

PDF

Open Access 1 Repo

TL;DR

This paper presents SLLMBO, a novel framework using large language models for hyperparameter optimization, incorporating dynamic search space adaptation and a new LLM-TPE sampler, outperforming traditional methods on multiple tasks.

Contribution

Introduces SLLMBO, the first framework benchmarking diverse LLMs for HPO, with a novel LLM-TPE sampler balancing exploration and exploitation effectively.

Findings

01

LLM-TPE outperforms fully LLM-based methods on 9 of 14 tasks.

02

SLLMBO achieves more robust optimization than traditional Bayesian methods.

03

The framework reduces API costs and mitigates premature early stopping.

Abstract

This study introduces SLLMBO, an innovative framework leveraging large language models (LLMs) for hyperparameter optimization (HPO), incorporating dynamic search space adaptability, enhanced parameter space exploitation, and a novel LLM-tree-structured parzen estimator (LLM-TPE) sampler. By addressing limitations in recent fully LLM-based methods and traditional bayesian optimization (BO), SLLMBO achieves more robust optimization. This comprehensive benchmarking evaluates multiple LLMs, including GPT-3.5-Turbo, GPT-4o, Claude-Sonnet-3.5, and Gemini-1.5-Flash, extending prior work and establishing SLLMBO as the first framework to benchmark a diverse set of LLMs for HPO. By integrating LLMs' established strengths in parameter initialization with the exploitation abilities demonstrated in this study, alongside TPE's exploration capabilities, the LLM-TPE sampler achieves a balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kananmahammadli/sllmbo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Cosine Annealing · Transformer · Byte Pair Encoding · Layer Normalization