Stochastic Multi-armed Bandits in Constant Space

David Liau; Eric Price; Zhao Song; Ger Yang

arXiv:1712.09007·cs.DS·May 17, 2018

Stochastic Multi-armed Bandits in Constant Space

David Liau, Eric Price, Zhao Song, Ger Yang

PDF

Open Access

TL;DR

This paper introduces a space-efficient algorithm for stochastic multi-armed bandits that operates with constant memory, achieving near-optimal regret bounds in a setting where recording all arm outcomes is infeasible.

Contribution

The paper presents the first constant-space algorithm for stochastic bandits with regret close to the optimal, addressing memory constraints in large-scale or resource-limited environments.

Findings

01

Achieves regret within an O(log 1/Δ) factor of the optimal without space constraints.

02

Uses only O(1) words of space, independent of the number of arms.

03

Provides theoretical guarantees matching known bounds in bounded reward settings.

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O (1)$ words of space with regret \[ \sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where $Δ_{i}$ is the gap between the best arm and arm $i$ and $Δ$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$ , this is within an $O (lo g 1/Δ)$ factor of the optimum regret possible without space constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms