Dueling Bandits with Qualitative Feedback

Liyuan Xu; Junya Honda; Masashi Sugiyama

arXiv:1809.05274·stat.ML·September 19, 2018

Dueling Bandits with Qualitative Feedback

Liyuan Xu, Junya Honda, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper introduces the qualitative dueling bandit (QDB) problem where agents receive qualitative feedback, proposing direct algorithms that outperform traditional dueling bandit algorithms by leveraging qualitative information.

Contribution

The paper formulates the QDB problem and develops novel direct algorithms that utilize qualitative feedback, leading to improved performance over existing dueling bandit methods.

Findings

01

Proposed algorithms outperform classic dueling bandit algorithms in QDB settings.

02

Theoretical analysis confirms significant performance improvements.

03

Experimental results demonstrate vast improvements over existing methods.

Abstract

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance---actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications