Graph Feedback Bandits on Similar Arms: With and Without Graph Structures
Han Qi, Fei Guo, Li Zhu, Qiaosheng Zhang

TL;DR
This paper investigates graph feedback in stochastic multi-armed bandits with similar arms, proposing new algorithms with regret bounds, extending to dynamic arm sets, and validating through experiments.
Contribution
Introduces two UCB-based algorithms for graph feedback bandits, extends methods to ballooning arm scenarios, and develops graph-structure-independent algorithms with theoretical guarantees.
Findings
Regret lower bounds established for graph feedback bandits.
Proposed algorithms achieve sub-linear regret bounds.
Algorithms validated through experiments.
Abstract
In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by applications in clinical trials and recommendation systems, we assume that two arms are connected if and only if they are similar (i.e., their means are close to each other). We establish a regret lower bound for this problem under the novel feedback structure and introduce two upper confidence bound (UCB)-based algorithms: Double-UCB, which has problem-independent regret upper bounds, and Conservative-UCB, which has problem-dependent upper bounds. Leveraging the similarity structure, we also explore a scenario where the number of arms increases over time (referred to as the \emph{ballooning setting}). Practical applications of this scenario include Q\&A platforms (e.g., Reddit, Stack Overflow, Quora) and product reviews on platforms like Amazon and Flipkart, where answers (or reviews)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications
