Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and   Beyond

Shuang Liu; Cheng Chen; Zhihua Zhang

arXiv:1504.03509·cs.LG·February 13, 2020

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

Shuang Liu, Cheng Chen, Zhihua Zhang

PDF

Open Access

TL;DR

This paper investigates the trade-off between regret and communication in distributed stochastic multi-armed bandits, proposing strategies that optimize regret with minimal communication, including when the time horizon is unknown.

Contribution

It introduces the Over-Exploration strategy for known horizons and characterizes the regret-communication relationship for unknown horizons using the new density measure.

Findings

01

One-round communication suffices for optimal regret when horizon is known.

02

Lower bounds on regret are established for unknown horizons.

03

Stable strategies match the lower bounds, optimizing communication-efficiency.

Abstract

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at specific points in time. We study the relationship between regret and communication. When the time horizon is known, we propose the Over-Exploration strategy, which only requires one-round communication and whose regret does not scale with the number of players. When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication. Specifically, a lower bound is established and stable strategies that match the lower bound are developed. The results and analyses in this paper are specific but can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Game Theory and Applications