Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization
Thomas Desautels (California Inst. of Technology), Andreas Krause (ETH, Zurich), Joel Burdick (California Inst. of Technology)

TL;DR
This paper introduces GP-BUCB, a parallel algorithm for Gaussian process bandit optimization that efficiently balances exploration and exploitation in batch settings, with theoretical guarantees and real-world validation.
Contribution
It presents a novel parallelization method for Gaussian process bandits with proven regret bounds, extending sequential algorithms to batch scenarios.
Findings
Parallel GP-BUCB has only constant-factor increased regret compared to sequential methods.
The algorithm effectively balances exploration and exploitation in batch experiments.
Empirical results demonstrate improved optimization in real-world applications.
Abstract
Can one parallelize complex exploration exploitation tradeoffs? As an example, consider the problem of optimal high-throughput experimental design, where we wish to sequentially design batches of experiments in order to simultaneously learn a surrogate function mapping stimulus to response and identify the maximum of the function. We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. We develop GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization. We prove a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B. Our results provide rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference
