Identifying the Best Transition Law

Mehrasa Ahmadipour; \'elise Crepon; Aur\'elien Garivier

arXiv:2502.12227·cs.LG·February 19, 2025

Identifying the Best Transition Law

Mehrasa Ahmadipour, \'elise Crepon, Aur\'elien Garivier

PDF

TL;DR

This paper investigates best-arm identification in bandit problems with multinomial rewards, comparing classical non-parametric confidence intervals and empirical likelihood methods within recursive learning frameworks.

Contribution

It introduces and evaluates EL-LUCB, a novel approach using empirical likelihood for joint probability estimation in bandit problems with known support.

Findings

01

EL-LUCB outperforms classical methods in complex scenarios

02

Empirical likelihood provides tighter confidence bounds

03

Strategies effectively identify best arms across various structures

Abstract

Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.