Preference-based Online Learning with Dueling Bandits: A Survey

Viktor Bengs; Robert Busa-Fekete; Adil El Mesaoudi-Paul; Eyke; H\"ullermeier

arXiv:1807.11398·cs.LG·July 13, 2021·24 cites

Preference-based Online Learning with Dueling Bandits: A Survey

Viktor Bengs, Robert Busa-Fekete, Adil El Mesaoudi-Paul, Eyke, H\"ullermeier

PDF

Open Access

TL;DR

This survey reviews the current state of preference-based multi-armed bandit problems, focusing on dueling bandits where feedback is based on relative preferences rather than numerical rewards, highlighting various problem settings and methods.

Contribution

It provides a comprehensive taxonomy and overview of methods for preference-based bandits, clarifying assumptions and feedback properties in this research area.

Findings

01

Summarizes different problem formulations in dueling bandits

02

Classifies methods based on data-generating assumptions

03

Highlights open challenges and future directions

Abstract

In machine learning, the notion of multi-armed bandits refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. In the standard setting, the agent learns from stochastic feedback in the form of real-valued rewards. In many applications, however, numerical reward signals are not readily available -- instead, only weaker information is provided, in particular relative preferences in the form of qualitative comparisons between pairs of alternatives. This observation has motivated the study of variants of the multi-armed bandit problem, in which more general representations are used both for the type of feedback to learn from and the target of prediction. The aim of this paper is to provide a survey of the state of the art in this field, referred to as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms