Reducing Dueling Bandits to Cardinal Bandits

Nir Ailon; Thorsten Joachims; Zohar Karnin

arXiv:1405.3396·cs.LG·May 15, 2014·35 cites

Reducing Dueling Bandits to Cardinal Bandits

Nir Ailon, Thorsten Joachims, Zohar Karnin

PDF

Open Access

TL;DR

This paper introduces algorithms that convert the Dueling Bandits problem into the well-studied stochastic Multi-Armed Bandits problem, enabling the use of existing algorithms and analysis tools for ordinal feedback scenarios.

Contribution

The paper proposes three reduction algorithms, providing regret bounds and empirical performance improvements over previous methods for Dueling Bandits.

Findings

01

Regret bounds established for two reduction algorithms.

02

Empirical results show superior performance of the proposed methods.

03

First almost optimal regret bound considering second order differences.

Abstract

We present algorithms for reducing the Dueling Bandits problem to the conventional (stochastic) Multi-Armed Bandits problem. The Dueling Bandits problem is an online model of learning with ordinal feedback of the form "A is preferred to B" (as opposed to cardinal feedback like "A has value 2.5"), giving it wide applicability in learning from implicit user feedback and revealed and stated preferences. In contrast to existing algorithms for the Dueling Bandits problem, our reductions -- named $\Doubler$ , $\MultiSbm$ and $\DoubleSbm$ -- provide a generic schema for translating the extensive body of known results about conventional Multi-Armed Bandit algorithms to the Dueling Bandits setting. For $\Doubler$ and $\MultiSbm$ we prove regret upper bounds in both finite and infinite settings, and conjecture about the performance of $\DoubleSbm$ which empirically outperforms the other two as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Auction Theory and Applications