Identifying the Best Transition Law
Mehrasa Ahmadipour, \'elise Crepon, Aur\'elien Garivier

TL;DR
This paper investigates best-arm identification in bandit problems with multinomial rewards, comparing classical non-parametric confidence intervals and empirical likelihood methods within recursive learning frameworks.
Contribution
It introduces and evaluates EL-LUCB, a novel approach using empirical likelihood for joint probability estimation in bandit problems with known support.
Findings
EL-LUCB outperforms classical methods in complex scenarios
Empirical likelihood provides tighter confidence bounds
Strategies effectively identify best arms across various structures
Abstract
Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
