Optimal Strategies for Graph-Structured Bandits

Hassan Saber (SEQUEL); Pierre M\'enard (SEQUEL); Odalric-Ambrym; Maillard (SEQUEL)

arXiv:2007.03224·cs.IT·July 13, 2020

Optimal Strategies for Graph-Structured Bandits

Hassan Saber (SEQUEL), Pierre M\'enard (SEQUEL), Odalric-Ambrym, Maillard (SEQUEL)

PDF

Open Access

TL;DR

This paper investigates optimal strategies for a structured multi-armed bandit problem involving graph-structured relationships between users and arms, deriving lower bounds and proposing an efficient, asymptotically optimal algorithm.

Contribution

It introduces the IMED-GS* algorithm tailored for graph-structured bandits, which is computationally efficient and does not rely on forced exploration, improving upon existing methods.

Findings

01

IMED-GS* is asymptotically optimal.

02

The algorithm requires about log(T) linear program solutions.

03

Numerical results confirm the algorithm's strong performance.

Abstract

We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli distributions $ν = (ν_a, b)_a \in A, b \in B$ with means $(μ_a, b)_a \in A, b \in B \in [0, 1]^{A \times B}$ and by a given weight matrix $ω = (ω_b, b^{'})_b, b^{'} \in B$ , where $A$ is a finite set of arms and $B$ is a finite set of users. The weight matrix $ω$ is such that for any two users $b, b^{'} \in B, max_a \in A ∣ μ_a, b - μ_a, b^{'} ∣ \leq ω_b, b^{'}$ . This formulation is flexible enough to capture various situations, from highly-structured scenarios ( $ω \in {0, 1}^{B \times B}$ ) to fully unstructured setups ( $ω \equiv 1$ ).We consider two scenarios depending on whether the learner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics