Efficient Bayesian Optimization with Deep Kernel Learning and Transformer Pre-trained on Multiple Heterogeneous Datasets
Wenlong Lyu, Shoubo Hu, Jie Chuai, Zhitang Chen

TL;DR
This paper introduces a pre-trained Gaussian process surrogate model with a deep kernel learned from a Transformer encoder, enabling efficient Bayesian optimization across multiple heterogeneous tasks.
Contribution
It proposes a novel pre-training approach for Gaussian process surrogates using deep kernels from Transformer features, improving transfer learning in Bayesian optimization.
Findings
Pre-trained models outperform existing methods on synthetic benchmarks.
The approach accelerates convergence in real-world optimization tasks.
Transfer learning enhances efficiency across diverse problem domains.
Abstract
Bayesian optimization (BO) is widely adopted in black-box optimization problems and it relies on a surrogate model to approximate the black-box response function. With the increasing number of black-box optimization tasks solved and even more to solve, the ability to learn from multiple prior tasks to jointly pre-train a surrogate model is long-awaited to further boost optimization efficiency. In this paper, we propose a simple approach to pre-train a surrogate, which is a Gaussian process (GP) with a kernel defined on deep features learned from a Transformer-based encoder, using datasets from prior tasks with possibly heterogeneous input spaces. In addition, we provide a simple yet effective mix-up initialization strategy for input tokens corresponding to unseen input variables and therefore accelerate new tasks' convergence. Experiments on both synthetic and real benchmark problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Advanced Multi-Objective Optimization Algorithms
MethodsGaussian Process
