Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Masrour Zoghi; Shimon Whiteson; Remi Munos; Maarten de Rijke

arXiv:1312.3393·cs.LG·December 18, 2013·57 cites

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke

PDF

Open Access

TL;DR

This paper introduces a novel Upper Confidence Bound-based algorithm for the K-armed dueling bandit problem, effectively handling relative feedback and demonstrating superior empirical performance with finite-time regret guarantees.

Contribution

It extends the UCB algorithm to the dueling bandit setting, providing theoretical regret bounds and improved empirical results over existing methods.

Findings

01

Achieves finite-time regret bound of O(log t)

02

Outperforms state-of-the-art algorithms in real data experiments

03

Effectively handles relative feedback in bandit problems

Abstract

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications