# Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

**Authors:** Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan

arXiv: 1901.08387 · 2019-01-25

## TL;DR

This paper introduces a simple, efficient regret minimization algorithm for multi-armed bandits that operates with a constant amount of memory, suitable for both finite and infinite arm settings, and demonstrates its effectiveness through theoretical bounds and experiments.

## Contribution

The paper presents a novel constant-memory algorithm for regret minimization in multi-armed bandits, applicable to finite and infinite cases, improving over prior methods that require extensive memory or restrictive assumptions.

## Key findings

- Achieves a regret bound of O(KM + K^{1.5}√(T log(T/MK))/M) for finite bandits.
- Extends to sub-linear quantile-regret.
- Empirically demonstrates efficiency through experiments.

## Abstract

In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the statistics of all the arms they encounter. This may become a problem for the cases where the number of available words of memory is limited. Designing an efficient regret minimisation algorithm that uses a constant number of words has long been interesting to the community. Some early attempts consider the number of arms to be infinite, and require the reward distribution of the arms to belong to some particular family. Recently, for finitely many-armed bandits an explore-then-commit based algorithm~\citep{Liau+PSY:2018} seems to escape such assumption. However, due to the underlying PAC-based elimination their method incurs a high regret. We present a conceptually simple, and efficient algorithm that needs to remember statistics of at most $M$ arms, and for any $K$-armed finite bandit instance it enjoys a $O(KM +K^{1.5}\sqrt{T\log (T/MK)}/M)$ upper-bound on regret. We extend it to achieve sub-linear \textit{quantile-regret}~\citep{RoyChaudhuri+K:2018} and empirically verify the efficiency of our algorithm via experiments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.08387/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1901.08387/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1901.08387/full.md

---
Source: https://tomesphere.com/paper/1901.08387