Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

Aleksandr Vorobev; Gleb Gusev

arXiv:1507.04910·cs.LG·July 20, 2015

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

Aleksandr Vorobev, Gleb Gusev

PDF

Open Access

TL;DR

This paper establishes fundamental lower bounds and optimal algorithms for a complex multi-armed bandit problem where the order of selected arms affects rewards, advancing understanding of regret minimization in non-standard settings.

Contribution

It introduces new lower bounds and optimal algorithms for the multi-armed bandit problem with ordered, non-equivalent multiple plays, a novel variant in the field.

Findings

01

Lower bounds with novel coefficients for regret in the problem.

02

Optimal algorithms matching the lower bounds.

03

Proof that these bounds are tight and cannot be improved.

Abstract

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics $O (lo g t)$ but novel coefficients and provide optimal algorithms, thus proving that these bounds cannot be improved.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications