The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Jian Wu, Peter I. Frazier

TL;DR
This paper introduces the parallel knowledge gradient method, a novel batch Bayesian optimization algorithm that efficiently finds global optima faster than previous methods, especially in noisy settings.
Contribution
It develops a new batch Bayesian optimization algorithm that is one-step Bayes-optimal and provides an efficient computation strategy for practical use.
Findings
Faster convergence to global optima compared to previous methods
Effective in noisy evaluation environments
Demonstrated success on synthetic and real machine learning tasks
Abstract
In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research
