Multi-Armed Bandits on Partially Revealed Unit Interval Graphs
Xiao Xu, Sattar Vakili, Qing Zhao, Ananthram Swami

TL;DR
This paper studies multi-armed bandit problems with side information modeled by unit interval graphs, proposing efficient learning policies that leverage the graph structure for improved decision-making in both fully and partially revealed settings.
Contribution
It introduces a novel two-step learning framework that exploits UIG topologies for efficient exploration and exploitation in multi-armed bandits with side information.
Findings
Proposed policies are computationally efficient.
Achieved order optimality in regret bounds.
Effective in both complete and partial UIG settings.
Abstract
A stochastic multi-armed bandit problem with side information on the similarity and dissimilarity across different arms is considered. The action space of the problem can be represented by a unit interval graph (UIG) where each node represents an arm and the presence (absence) of an edge between two nodes indicates similarity (dissimilarity) between their mean rewards. Two settings of complete and partial side information based on whether the UIG is fully revealed are studied and a general two-step learning structure consisting of an offline reduction of the action space and online aggregation of reward observations from similar arms is proposed to fully exploit the topological structure of the side information. In both cases, the computation efficiency and the order optimality of the proposed learning policies in terms of both the size of the action space and the time length are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
