Communication-Constrained Bandits under Additive Gaussian Noise

Prathamesh Mayekar; Jonathan Scarlett; and Vincent Y.F. Tan

arXiv:2304.12680·cs.LG·June 7, 2023·1 cites

Communication-Constrained Bandits under Additive Gaussian Noise

Prathamesh Mayekar, Jonathan Scarlett, and Vincent Y.F. Tan

PDF

Open Access 1 Video

TL;DR

This paper investigates the fundamental limits and proposes an optimal algorithm for distributed multi-armed bandits with communication constraints and Gaussian noise, achieving near-minimax regret bounds.

Contribution

It derives a tight information-theoretic lower bound and introduces the $ exttt{UE-}UCB++$ algorithm that nearly attains this bound in a communication-constrained noisy setting.

Findings

01

Lower bound on regret: $oxed{ ilde{ ext{O}}ig(rac{ ext{poly}(K)}{ ext{SNR}}ig)}$

02

Proposed $ exttt{UE-}UCB++$ matches the lower bound up to a small additive factor

03

Algorithm effectively refines reward estimates through phased exploration and encoding.

Abstract

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$ , and this encoded reward is further corrupted by additive Gaussian noise of variance $σ^{2}$ ; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $Ω (\frac{K T}{SNR \land 1})$ on the minimax regret of any scheme, where $SNR := \frac{P}{σ ^{2}}$ , and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $UE - UCB + +$ , which matches this lower bound to a minor additive factor.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Communication-Constrained Bandits under Additive Gaussian Noise· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization