Streaming Algorithms for Stochastic Multi-armed Bandits

Arnab Maiti; Vishakha Patil; Arindam Khan

arXiv:2012.05142·cs.LG·December 10, 2020

Streaming Algorithms for Stochastic Multi-armed Bandits

Arnab Maiti, Vishakha Patil, Arindam Khan

PDF

Open Access

TL;DR

This paper investigates streaming algorithms for stochastic multi-armed bandits with limited memory, providing new bounds for regret and proposing algorithms for best-arm identification under memory constraints.

Contribution

It establishes an almost tight regret lower bound for bounded-memory bandits and introduces adaptive streaming algorithms for efficient best-arm identification.

Findings

01

Omega(T^{2/3}) regret lower bound for large memory

02

An r-round adaptive streaming algorithm matching lower bounds

03

A heuristic with optimal sample complexity using minimal memory

Abstract

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight hardness. We show {\Omega}(T^{2/3}) cumulative regret in expectation for arm-memory size of (n-1), where n is the number of arms. For best-arm identification, we study two algorithms. First, we present an O(r) arm-memory r-round adaptive streaming algorithm to find an {\epsilon}-best arm. In r-round adaptive streaming algorithm for best-arm identification, the arm pulls in each round are decided based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Stochastic Gradient Optimization Techniques