A KL-LUCB Bandit Algorithm for Large-Scale Crowdsourcing
Bob Mankoff, Robert Nowak, Ervin Tanczos

TL;DR
This paper introduces a new bandit algorithm combining lil-UCB and KL-LUCB for efficient best-arm identification in large-scale crowdsourcing, supported by theoretical bounds and experimental validation.
Contribution
It presents a novel anytime confidence bound for bounded distributions and fuses existing algorithms to improve large-scale crowdsourcing performance.
Findings
Proves a new confidence bound for bounded rewards
Demonstrates improved identification efficiency in experiments
Validates theoretical results with real-world data
Abstract
This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussian distributions. We corroborate our theoretical results with numerical experiments based on the New Yorker Cartoon Caption Contest.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference
