Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, R\'emi, Munos, Peter Auer, Andr\'as Antos

TL;DR
This paper introduces adaptive algorithms based on upper-confidence bounds for efficiently estimating the means of multiple distributions with limited samples, accounting for unknown variances and distribution shapes.
Contribution
It proposes two novel UCB-based strategies for adaptive sampling in multi-armed bandits, with finite-sample analysis of their estimation error performance.
Findings
Performance depends on variances and distribution shapes.
Strategies adaptively allocate samples based on confidence bounds.
Finite-sample bounds compare to optimal allocation.
Abstract
In this paper, we study the problem of estimating uniformly well the mean values of several distributions given a finite budget of samples. If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance. However, in the more realistic case where the distributions are not known in advance, one needs to design adaptive sampling strategies in order to select which distribution to sample from according to the previously observed samples. We describe two strategies based on pulling the distributions a number of times that is proportional to a high-probability upper-confidence-bound on their variance (built from previous observed samples) and report a finite-sample performance analysis on the excess estimation error compared to the optimal allocation. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
