On Regret-Optimal Learning in Decentralized Multi-player Multi-armed   Bandits

Naumaan Nayyar; Dileep Kalathil; Rahul Jain

arXiv:1505.00553·stat.ML·December 2, 2016

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

Naumaan Nayyar, Dileep Kalathil, Rahul Jain

PDF

TL;DR

This paper introduces decentralized policies for multi-armed bandit problems that achieve near-optimal regret growth, improving previous bounds and addressing the challenge of decentralized coordination without communication.

Contribution

Proposes two new decentralized algorithms, E^3 and E^3-TS, that attain near-logarithmic regret growth in multi-player multi-armed bandits, reducing the regret gap in decentralized learning.

Findings

01

Regret grows at most as O(log^{1+ε} T) with the new policies.

02

Improves regret bounds from O(log^2 T) to near O(log T).

03

Addresses decentralized learning costs, showing they are at most an ε-factor worse.

Abstract

We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution. The objective is to design a policy that maximizes the expected reward over a time horizon for a single player setting and the sum of expected rewards for the multiplayer setting. In the multiplayer setting, arms may give different rewards to different players. There is no separate channel for coordination among the players. Any attempt at communication is costly and adds to regret. We propose two decentralizable policies, $E^{3}$ ( $E$ - $cubed$ ) and $E^{3}$ - $TS$ , that can be used in both single player and multiplayer settings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.