Best-Arm Identification in Correlated Multi-Armed Bandits
Samarth Gupta, Gauri Joshi, Osman Ya\u{g}an

TL;DR
This paper introduces C-LUCB, a new algorithm for best-arm identification in correlated multi-armed bandits that leverages correlation knowledge to significantly reduce sample complexity compared to traditional methods.
Contribution
The paper proposes a novel correlated bandit framework and the C-LUCB algorithm, which exploits correlation information to improve sample efficiency in best-arm identification.
Findings
C-LUCB reduces sample complexity by focusing on a subset of competitive arms.
Theoretical analysis shows sample complexity depends on the size of the set of competitive arms.
Experimental results on recommendation datasets validate the efficiency of C-LUCB.
Abstract
In this paper we consider the problem of best-arm identification in multi-armed bandits in the fixed confidence setting, where the goal is to identify, with probability for some , the arm with the highest mean reward in minimum possible samples from the set of arms . Most existing best-arm identification algorithms and analyses operate under the assumption that the rewards corresponding to different arms are independent of each other. We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms in the form of upper bounds on expected conditional reward of an arm, given a reward realization from another arm. Our proposed algorithm C-LUCB, which generalizes the LUCB algorithm utilizes this partial knowledge of correlations to sharply reduce the sample complexity of best-arm identification. More…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
