Versatile Dueling Bandits: Best-of-both-World Analyses for Online   Learning from Preferences

Aadirupa Saha; Pierre Gaillard

arXiv:2202.06694·cs.LG·February 15, 2022

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

Aadirupa Saha, Pierre Gaillard

PDF

Open Access

TL;DR

This paper introduces a unified algorithm for dueling bandits that performs optimally in both stochastic and adversarial environments, achieving instance-specific regret bounds and robustness to corrupted preferences.

Contribution

It presents the first best-of-both-world algorithm for dueling bandits with optimal regret bounds and a novel reduction from dueling to multi-armed bandits, simplifying analysis and improving guarantees.

Findings

01

Achieves optimal $O(rac{ ext{log} T}{ ext{gap}_i})$ regret bound against the Condorcet-winner.

02

Proves robustness and optimal regret in adversarially corrupted preferences.

03

Empirically outperforms existing dueling bandit algorithms.

Abstract

We study the problem of $K$ -armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. In particular, \emph{we give the first best-of-both world result for the dueling bandits regret minimization problem} -- a unified framework that is guaranteed to perform optimally for both stochastic and adversarial preferences simultaneously. Moreover, our algorithm is also the first to achieve an optimal $O (\sum_{i = 1}^{K} \frac{l o g T}{Δ _{i}})$ regret bound against the Condorcet-winner benchmark, which scales optimally both in terms of the arm-size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Auction Theory and Applications