Variational Bayesian Optimistic Sampling

Brendan O'Donoghue; Tor Lattimore

arXiv:2110.15688·stat.ML·November 1, 2021

Variational Bayesian Optimistic Sampling

Brendan O'Donoghue, Tor Lattimore

PDF

Open Access 1 Video

TL;DR

This paper introduces a Bayesian optimistic sampling approach for online decision problems, providing a unified analysis of regret bounds and extending to complex settings like saddle-point problems, with a flexible, variational framework.

Contribution

It develops a new class of Bayesian optimistic policies, including a variational method that works with any posterior, and extends regret analysis to bilinear saddle-point problems.

Findings

01

Optimistic policies achieve $ ilde O( oot{A}{T})$ Bayesian regret.

02

Thompson sampling may suffer linear regret outside the optimistic set.

03

The variational approach allows flexible policy tuning and constraint incorporation.

Abstract

We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde{O} (A T)$ Bayesian regret for a problem with $A$ actions after $T$ rounds. We extend the regret analysis for optimistic policies to bilinear saddle-point problems which include zero-sum matrix games and constrained bandits as special cases. In this case we show that Thompson sampling can produce policies outside of the optimistic set and suffer linear regret in some instances. Finding a policy inside the optimistic set amounts to solving a convex optimization problem and we call the resulting algorithm `variational Bayesian optimistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variational Bayesian Optimistic Sampling· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms