# The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

**Authors:** Hock Peng Chan

arXiv: 1703.08285 · 2019-01-17

## TL;DR

This paper introduces efficient non-parametric algorithms for the multi-armed bandit problem, addressing limitations of existing methods under general parametric settings, and enhancing arm allocation strategies in machine learning applications.

## Contribution

It proposes novel non-parametric procedures that are computationally efficient and effective across various reward distribution settings, improving upon existing methods.

## Key findings

- New non-parametric algorithms outperform traditional methods in diverse settings
- Proposed methods achieve lower regret compared to existing non-parametric approaches
- Algorithms are applicable to a wide range of reward distributions

## Abstract

Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Non-parametric arm allocation procedures like $\epsilon$-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under non-parametric settings. However unlike UCB these non-parametric procedures are not efficient under general parametric settings. In this paper we propose efficient non-parametric procedures.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.08285/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1703.08285/full.md

---
Source: https://tomesphere.com/paper/1703.08285