Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process   Bandit Optimization

Thomas Desautels (California Inst. of Technology); Andreas Krause (ETH; Zurich); Joel Burdick (California Inst. of Technology)

arXiv:1206.6402·cs.LG·July 3, 2012·ICML·69 cites

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

Thomas Desautels (California Inst. of Technology), Andreas Krause (ETH, Zurich), Joel Burdick (California Inst. of Technology)

PDF

Open Access

TL;DR

This paper introduces GP-BUCB, a parallel algorithm for Gaussian process bandit optimization that efficiently balances exploration and exploitation in batch settings, with theoretical guarantees and real-world validation.

Contribution

It presents a novel parallelization method for Gaussian process bandits with proven regret bounds, extending sequential algorithms to batch scenarios.

Findings

01

Parallel GP-BUCB has only constant-factor increased regret compared to sequential methods.

02

The algorithm effectively balances exploration and exploitation in batch experiments.

03

Empirical results demonstrate improved optimization in real-world applications.

Abstract

Can one parallelize complex exploration exploitation tradeoffs? As an example, consider the problem of optimal high-throughput experimental design, where we wish to sequentially design batches of experiments in order to simultaneously learn a surrogate function mapping stimulus to response and identify the maximum of the function. We formalize the task as a multi-armed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. We develop GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization. We prove a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B. Our results provide rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference