Reducing Dueling Bandits to Cardinal Bandits
Nir Ailon, Thorsten Joachims, Zohar Karnin

TL;DR
This paper introduces algorithms that convert the Dueling Bandits problem into the well-studied stochastic Multi-Armed Bandits problem, enabling the use of existing algorithms and analysis tools for ordinal feedback scenarios.
Contribution
The paper proposes three reduction algorithms, providing regret bounds and empirical performance improvements over previous methods for Dueling Bandits.
Findings
Regret bounds established for two reduction algorithms.
Empirical results show superior performance of the proposed methods.
First almost optimal regret bound considering second order differences.
Abstract
We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named , and -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For and we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of which empirically outperforms the other two as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Auction Theory and Applications
