Preference-based Online Learning with Dueling Bandits: A Survey
Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke, H\"ullermeier

TL;DR
This survey reviews the current state of preference-based multi-armed bandit problems, focusing on dueling bandits where feedback is based on relative preferences rather than numerical rewards, highlighting various problem settings and methods.
Contribution
It provides a comprehensive taxonomy and overview of methods for preference-based bandits, clarifying assumptions and feedback properties in this research area.
Findings
Summarizes different problem formulations in dueling bandits
Classifies methods based on data-generating assumptions
Highlights open challenges and future directions
Abstract
In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available -- instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state of the art in this field, referred to as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
