Identifying Copeland Winners in Dueling Bandits with Indifferences
Viktor Bengs, Bj\"orn Haddenhorst, Eyke H\"ullermeier

TL;DR
This paper studies the problem of identifying the best option in dueling bandits with possible indifference feedback, proposing a near-optimal algorithm with strong empirical results and improved bounds under certain conditions.
Contribution
It introduces POCOWISTA, a novel algorithm for Copeland winner identification in dueling bandits with indifferences, along with theoretical bounds and empirical validation.
Findings
Proposed POCOWISTA algorithm achieves near-optimal sample complexity.
Established lower bounds for sample complexity in this setting.
Enhanced version with improved bounds under stochastic transitivity.
Abstract
We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
