Sample-Efficient Optimisation with Probabilistic Transformer Surrogates
Alexandre Maraval, Matthieu Zimmer, Antoine Grosnit, Rasul Tutunov,, Jun Wang, Haitham Bou Ammar

TL;DR
This paper introduces a novel probabilistic transformer surrogate for Bayesian Optimization, addressing training limitations and demonstrating competitive performance with significant efficiency gains over Gaussian Processes.
Contribution
It proposes a BO-specific training prior and a posterior regularizer for transformers, enabling effective, pre-trained surrogate models without retraining.
Findings
Transformer surrogate matches GP performance on benchmarks
Order of magnitude faster inference due to pre-training
Effective in exploration regions despite training challenges
Abstract
Faced with problems of increasing complexity, recent research in Bayesian Optimisation (BO) has focused on adapting deep probabilistic models as flexible alternatives to Gaussian Processes (GPs). In a similar vein, this paper investigates the feasibility of employing state-of-the-art probabilistic transformers in BO. Upon further investigation, we observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation. First, we notice that these models are trained on uniformly distributed inputs, which impairs predictive accuracy on non-uniform data - a setting arising from any typical BO loop due to exploration-exploitation trade-offs. Second, we realise that training losses (e.g., cross-entropy) only asymptotically guarantee accurate posterior approximations, i.e., after arriving at the global optimum,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Forecasting Techniques and Applications
