Reproducibility Study of Large Language Model Bayesian Optimization
Adam Rychert, Gasper Spagnolo, Evgenii Posashkov

TL;DR
This study reproduces and validates the LLAMBO framework for prompting-based Bayesian optimization using large language models, demonstrating its robustness and effectiveness with different model backbones, especially Llama 3.1 70B.
Contribution
It confirms the core claims of LLAMBO and shows its robustness to different language model backbones, highlighting the importance of textual context and the limitations of smaller models.
Findings
Contextual warm starting improves early regret and reduces variance.
LLAMBO's candidate sampler outperforms TPE and random sampling.
Smaller models like Gemma 27B show unstable or invalid predictions.
Abstract
In this reproducibility study, we revisit the LLAMBO framework of Daxberger et al. (2024), a prompting-based Bayesian optimization (BO) method that uses large language models as discriminative surrogates and acquisition optimizers via text-only interactions. We replicate the core Bayesmark and HPOBench experiments under the original evaluation protocol, but replace GPT-3.5 with the open-weight Llama 3.1 70B model used for all text encoding components. Our results broadly confirm the main claims of LLAMBO. Contextual warm starting via textual problem and hyperparameter descriptions substantially improves early regret behaviour and reduces variance across runs. LLAMBO's discriminative surrogate is weaker than GP or SMAC as a pure single task regressor, yet benefits from cross task semantic priors induced by the language model. Ablations that remove textual context markedly degrade…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
