A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Yucen Lily Li; Tim G. J. Rudner; Andrew Gordon Wilson

arXiv:2305.20028·cs.LG·May 9, 2024·5 cites

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

PDF

Open Access 2 Repos 1 Video 3 Reviews

TL;DR

This paper compares various Bayesian neural network surrogate models for Bayesian optimization, highlighting their strengths and weaknesses across different problem types and emphasizing the importance of tailored approaches.

Contribution

It provides a comprehensive evaluation of BNN surrogate methods, including finite and infinite-width models, for Bayesian optimization across diverse problem settings.

Findings

01

HMC performs best for fully stochastic BNNs

02

Method ranking is highly problem-dependent

03

Infinite-width BNNs are promising in high dimensions

Abstract

Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs,…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

- The methods under comparison are carefully selected to span a wide range of possible BNNs, and experiments are nicely designed to unveil specific insights about the relative strengths/weaknesses of different families of methods. - I think some of the conclusions/insights from the empirical comparisons can indeed be useful for future applications of Bayesian optimization, such as the competitiveness of deep kernel learning, the promising results of infinite-width BNNs in high-dimensional probl

Weaknesses

**(1)** I think it would make the study more complete if another relevant line of works is discussed: using (non-Bayesian) neural networks as the surrogate in BO and using neural tangent kernel for exploration. The recent line of work on neural bandits has made it possible to use (non-Bayesian) neural networks as the surrogate in BO while still preserving the regret guarantee of BO by using the theory of the NTK, The relevance of neural bandits in BO has been shown by [1] below, and you can also

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

1. The results are of interest to BO researchers and BO practitioners looking for potential methods of improving the effectiveness of BO in real-world applications. 2. The empirical investigation is extensive and carefully planned, covering several synthetic and real-world benchmarks, and provides support for many interesting hypotheses as well, such as the relative performance on high dimensional problems and the role of hyperparameters including network architecture. 3. This paper is a gold st

Weaknesses

1. For an empirical paper whose conclusions rest solely on the experimental results, 5 trials for each experimental setup is too little, as evidenced by multiple plots having heavily overlapping confidence intervals, Figure 6 in particular. 2. A few clarifying questions, please see the Questions section.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The topic is interesting and somewhat important for the community. - The presentation of theory and the experiments is well presented and sound.

Weaknesses

- There is no discussion on the interaction of the surrogate and the acquisition function. - I agree with the authors that the time might not be relevant when the function evaluations are expensive, still it is important to create an experiment assuming fast function evaluations and see whether the ranking holds - Although the dataset collection is diverse, the study is performed on a very small amount of datasets. There is no guarantee that these findings extrapolate easily to new datasets. - A

Code & Models

Repositories

Videos

A Study of Bayesian Neural Network Surrogates for Bayesian Optimization· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Machine Learning and Data Classification

MethodsDeep Ensembles · Greedy Policy Search · Gaussian Process