Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

Fang Liu; Zizhan Zheng; Ness Shroff

arXiv:1805.08930·stat.ML·May 24, 2018·6 cites

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

Fang Liu, Zizhan Zheng, Ness Shroff

PDF

Open Access

TL;DR

This paper investigates Thompson Sampling in graphical bandit problems with evolving and partially unknown feedback graphs, establishing its optimality in undirected cases and proposing a new variant for directed graphs.

Contribution

It proves the optimality of original Thompson Sampling for undirected graphical bandits and introduces a new variant that achieves near-optimal regret in directed settings.

Findings

01

Thompson Sampling achieves near-optimal regret in undirected graphical bandits.

02

A new Thompson Sampling variant attains near-optimal regret in directed graphical bandits.

03

Algorithms are computationally efficient and do not require prior knowledge of feedback graphs.

Abstract

We study multi-armed bandit problems with graph feedback, in which the decision maker is allowed to observe the neighboring actions of the chosen action, in a setting where the graph may vary over time and is never fully revealed to the decision maker. We show that when the feedback graphs are undirected, the original Thompson Sampling achieves the optimal (within logarithmic factors) regret $\tilde{O} (β_{0} (G) T)$ over time horizon $T$ , where $β_{0} (G)$ is the average independence number of the latent graphs. To the best of our knowledge, this is the first result showing that the original Thompson Sampling is optimal for graphical bandits in the undirected setting. A slightly weaker regret bound of Thompson Sampling in the directed setting is also presented. To fill this gap, we propose a variant of Thompson Sampling, that attains the optimal regret in the directed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics