Dueling Bandits with Qualitative Feedback
Liyuan Xu, Junya Honda, Masashi Sugiyama

TL;DR
This paper introduces the qualitative dueling bandit (QDB) problem where agents receive qualitative feedback, proposing direct algorithms that outperform traditional dueling bandit algorithms by leveraging qualitative information.
Contribution
The paper formulates the QDB problem and develops novel direct algorithms that utilize qualitative feedback, leading to improved performance over existing dueling bandit methods.
Findings
Proposed algorithms outperform classic dueling bandit algorithms in QDB settings.
Theoretical analysis confirms significant performance improvements.
Experimental results demonstrate vast improvements over existing methods.
Abstract
We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance---actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
