Bounded Regret for Finite-Armed Structured Bandits
Tor Lattimore, Remi Munos

TL;DR
This paper introduces a new algorithm for structured bandit problems where arm rewards are interdependent, achieving finite regret under certain conditions and nearly optimal performance in some cases.
Contribution
The paper proposes a novel algorithm for structured bandits with interdependent arms and establishes finite regret bounds and near-optimality in specific scenarios.
Findings
The algorithm achieves finite expected cumulative regret in certain structured bandit problems.
Lower bounds show the near-optimality of the proposed algorithm in some cases.
The approach extends traditional bandit analysis to interdependent reward settings.
Abstract
We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
