Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Evrard Garcelon; Vashist Avadhanula; Alessandro Lazaric and; Matteo Pirotta

arXiv:2112.06517·cs.LG·April 13, 2022

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric and, Matteo Pirotta

PDF

Open Access

TL;DR

This paper studies a multi-armed bandit problem where the learner receives noisy, possibly biased evaluations of each arm's reward and aims to select the top K arms to maximize cumulative reward over time.

Contribution

It introduces algorithms and theoretical guarantees for top K arm selection under noisy evaluations, with improved regret bounds for specific evaluation models.

Findings

01

Achieves $ ilde{O}(T^{2/3})$ regret in general case

02

Achieves $ ilde{O}( oot{2}rom T)$ regret with linear evaluation functions

03

Empirical results validate theoretical bounds and compare approaches

Abstract

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated. First, we show a $O (T^{2/3})$ regret in the general case when the observation functions are a genearalized linear function of the true rewards. On the other hand, we show that an improved $O (T)$ regret can be derived when the observation functions are noisy linear functions of the true rewards. Finally, we report an empirical validation that confirms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems