Regret bounds for Narendra-Shapiro bandit algorithms

S\'ebastien Gadat; Fabien Panloup; Sofiane Saadane

arXiv:1502.04874·math.PR·January 19, 2016·2 cites

Regret bounds for Narendra-Shapiro bandit algorithms

S\'ebastien Gadat, Fabien Panloup, Sofiane Saadane

PDF

Open Access

TL;DR

This paper investigates the regret bounds of Narendra-Shapiro bandit algorithms, demonstrating that penalized versions can achieve sublinear regret bounds and extending convergence results to multi-armed cases with PDMP analysis.

Contribution

The paper provides the first regret bounds for penalized Narendra-Shapiro algorithms and extends convergence and ergodic properties to multi-armed bandits with PDMP analysis.

Findings

01

Pseudo-regret is bounded by C√n for penalized two-armed bandits.

02

Convergence results are extended to multi-armed bandits.

03

Ergodic properties of the associated PDMP are established.

Abstract

Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this paper, we adress the following question: are the Narendra-Shapiro (NS) bandit algorithms competitive from a \textit{regret} point of view? In our main result, we show that some competitive bounds can be obtained for such algorithms in their penalized version (introduced in \cite{Lamberton_Pages}). More precisely, up to an over-penalization modification, the pseudo-regret $\overset{ˉ}{R}_{n}$ related to the penalized two-armed bandit algorithm is uniformly bounded by $C n$ (where $C$ is made explicit in the paper). \noindent We also generalize existing convergence and rates of convergence results to the multi-armed case of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms