Analysis of Thompson Sampling for Graphical Bandits Without the Graphs
Fang Liu, Zizhan Zheng, Ness Shroff

TL;DR
This paper investigates Thompson Sampling in graphical bandit problems with evolving and partially unknown feedback graphs, establishing its optimality in undirected cases and proposing a new variant for directed graphs.
Contribution
It proves the optimality of original Thompson Sampling for undirected graphical bandits and introduces a new variant that achieves near-optimal regret in directed settings.
Findings
Thompson Sampling achieves near-optimal regret in undirected graphical bandits.
A new Thompson Sampling variant attains near-optimal regret in directed graphical bandits.
Algorithms are computationally efficient and do not require prior knowledge of feedback graphs.
Abstract
We study multi-armed bandit problems with graph feedback, in which the decision maker is allowed to observe the neighboring actions of the chosen action, in a setting where the graph may vary over time and is never fully revealed to the decision maker. We show that when the feedback graphs are undirected, the original Thompson Sampling achieves the optimal (within logarithmic factors) regret over time horizon , where is the average independence number of the latent graphs. To the best of our knowledge, this is the first result showing that the original Thompson Sampling is optimal for graphical bandits in the undirected setting. A slightly weaker regret bound of Thompson Sampling in the directed setting is also presented. To fill this gap, we propose a variant of Thompson Sampling, that attains the optimal regret in the directed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
