Bounded Regret for Finite-Armed Structured Bandits

Tor Lattimore; Remi Munos

arXiv:1411.2919·cs.LG·November 12, 2014·82 cites

Bounded Regret for Finite-Armed Structured Bandits

Tor Lattimore, Remi Munos

PDF

Open Access

TL;DR

This paper introduces a new algorithm for structured bandit problems where arm rewards are interdependent, achieving finite regret under certain conditions and nearly optimal performance in some cases.

Contribution

The paper proposes a novel algorithm for structured bandits with interdependent arms and establishes finite regret bounds and near-optimality in specific scenarios.

Findings

01

The algorithm achieves finite expected cumulative regret in certain structured bandit problems.

02

Lower bounds show the near-optimality of the proposed algorithm in some cases.

03

The approach extends traditional bandit analysis to interdependent reward settings.

Abstract

We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms