Streaming Algorithms for Stochastic Multi-armed Bandits
Arnab Maiti, Vishakha Patil, Arindam Khan

TL;DR
This paper investigates streaming algorithms for stochastic multi-armed bandits with limited memory, providing new bounds for regret and proposing algorithms for best-arm identification under memory constraints.
Contribution
It establishes an almost tight regret lower bound for bounded-memory bandits and introduces adaptive streaming algorithms for efficient best-arm identification.
Findings
Omega(T^{2/3}) regret lower bound for large memory
An r-round adaptive streaming algorithm matching lower bounds
A heuristic with optimal sample complexity using minimal memory
Abstract
We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show {\Omega}(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an {\epsilon}-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Stochastic Gradient Optimization Techniques
