Adversarial Dueling Bandits

Aadirupa Saha; Tomer Koren; Yishay Mansour

arXiv:2010.14563·cs.LG·October 29, 2020

Adversarial Dueling Bandits

Aadirupa Saha, Tomer Koren, Yishay Mansour

PDF

Open Access 1 Video

TL;DR

This paper studies regret minimization in adversarial dueling bandits, introducing algorithms with matching upper and lower bounds for regret in both general and fixed-gap settings, advancing understanding of preference-based online learning.

Contribution

It proposes new algorithms with tight regret bounds for adversarial dueling bandits, including the Borda-winner setting and a simplified fixed-gap model, extending theoretical understanding.

Findings

01

Achieves $ ilde{O}(K^{1/3}T^{2/3})$ regret bound for Borda-winner in adversarial setting.

02

Provides a lower bound of $ ilde{ olinebreak} ext{Omega}(K^{1/3}T^{2/3})$, matching the upper bound.

03

In the fixed-gap setup, offers an $ ilde{O}((K/ riangle^2) ext{log}T)$ regret algorithm with tight lower bounds.

Abstract

We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `win-loss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose $T$ -round regret compared to the \emph{Borda-winner} from a set of $K$ items is $\tilde{O} (K^{1/3} T^{2/3})$ , as well as a matching $Ω (K^{1/3} T^{2/3})$ lower bound. We also prove a similar high probability regret bound. We further consider a simpler \emph{fixed-gap} adversarial setup, which bridges between two extreme preference feedback models for dueling bandits: stationary preferences and an arbitrary sequence of preferences. For the fixed-gap adversarial setup we give an $\smash{ \tilde{O}((K/\Delta^2)\log{T})…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Adversarial Dueling Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems