TL;DR
This paper introduces a new multi-armed bandit framework leveraging a shared latent random source to identify and eliminate non-competitive arms, significantly reducing regret and improving efficiency.
Contribution
It proposes a generalized UCB algorithm that exploits arm correlations via a latent source, reducing the problem complexity and achieving constant regret in certain regimes.
Findings
The algorithm reduces exploration of non-competitive arms to constant times.
In some regimes, the algorithm attains constant regret, outperforming traditional logarithmic regret.
Theoretical analysis confirms the algorithm's optimality in certain settings.
Abstract
We consider a novel multi-armed bandit framework where the rewards obtained by pulling the arms are functions of a common latent random variable. The correlation between arms due to the common random source can be used to design a generalized upper-confidence-bound (UCB) algorithm that identifies certain arms as , and avoids exploring them. As a result, we reduce a -armed bandit problem to a -armed problem, where includes the best arm and arms. Our regret analysis shows that the competitive arms need to be pulled times, while the non-competitive arms are pulled only times. As a result, there are regimes where our algorithm achieves a regret as opposed to the typical logarithmic regret scaling of multi-armed bandit algorithms. We also evaluate lower bounds on the expected regret and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
