Dueling Bandits With Weak Regret

Bangrui Chen; Peter I. Frazier

arXiv:1706.04304·cs.LG·June 15, 2017·1 cites

Dueling Bandits With Weak Regret

Bangrui Chen, Peter I. Frazier

PDF

Open Access

TL;DR

This paper introduces the Winner Stays (WS) algorithm for dueling bandits, effectively minimizing weak and strong regret in content recommendation tasks with pairwise feedback, outperforming existing methods in simulations and real data.

Contribution

The paper proposes the first weak regret-optimized dueling bandit algorithm, Winner Stays, with theoretical guarantees and practical efficiency for both weak and strong regret settings.

Findings

01

WS-W achieves constant weak regret over time.

02

WS outperforms existing algorithms in simulations.

03

WS is computationally simple for many arms.

Abstract

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: the more well-studied strong regret, which is 0 only when both arms pulled are the Condorcet winner; and the less well-studied weak regret, which is 0 if either arm pulled is the Condorcet winner. We propose a new algorithm for this problem, Winner Stays (WS), with variations for each kind of regret: WS for weak regret (WS-W) has expected cumulative weak regret that is $O (N^{2})$ , and $O (N lo g (N))$ if arms have a total order; WS for strong regret (WS-S) has expected cumulative strong regret of $O (N^{2} + N lo g (T))$ , and $O (N lo g (N) + N lo g (T))$ if arms have a total order. WS-W is the first dueling bandit algorithm with weak regret that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms