BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks

Ruth Wan Theng Chew; Zhiliang Chen; Apivich Hemachandra; Bryan Kian Hsiang Low

arXiv:2605.17000·cs.LG·May 19, 2026

BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks

Ruth Wan Theng Chew, Zhiliang Chen, Apivich Hemachandra, Bryan Kian Hsiang Low

PDF

1 Repo

TL;DR

BoLT is a new benchmark designed to evaluate black-box optimization methods on real, expensive large language model tasks, facilitating research and development in this area.

Contribution

It introduces the first LLM-centric benchmark with real experimental data, covering complex optimization scenarios to advance BBO research.

Findings

01

Selected BO methods outperform others across tasks.

02

Gaps identified in existing BBO methods for LLM optimization.

03

BoLT enables reproducible evaluation on real LLM data.

Abstract

Optimization of LLM training and inference configurations, such as hyperparameters, data mixtures, and prompts, is critical to performance, but it is often approached heuristically in practice, leading to potentially suboptimal outcomes. By framing them as noisy, expensive, and derivative-free optimization problems, Bayesian optimization (BO) and other black-box optimization (BBO) methods offer a promising yet underexplored direction for principled, sample-efficient methods. However, LLM training and inference costs are prohibitively high for most of the BBO research community, and new methods are often only evaluated on synthetic test functions and small-scale datasets that fail to capture the challenges of modern LLM optimization problems. This impedes the development of BBO methods and makes it difficult to assess their effectiveness on modern LLM tasks. We introduce BoLT, the first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chewwt/bolt
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.