Solving Multi-Arm Bandit Using a Few Bits of Communication
Osama A. Hanna, Lin F. Yang, Christina Fragouli

TL;DR
This paper introduces QuBan, a reward quantization algorithm that enables multi-armed bandit learning over wireless networks with minimal communication, requiring as few as 3 bits per reward while maintaining optimal regret.
Contribution
It provides a generic reward quantization method, QuBan, that reduces communication in distributed MAB problems without increasing regret, supported by tight theoretical bounds.
Findings
QuBan achieves as low as 3 bits per reward communication.
Theoretical bounds match the minimal bits needed for accurate learning.
Numerical experiments validate the effectiveness of the proposed method.
Abstract
The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
