Non-Stationary Dueling Bandits

Patrick Kolpaczki; Viktor Bengs; Eyke H\"ullermeier

arXiv:2202.00935·cs.LG·February 3, 2022

Non-Stationary Dueling Bandits

Patrick Kolpaczki, Viktor Bengs, Eyke H\"ullermeier

PDF

Open Access

TL;DR

This paper addresses the non-stationary dueling bandits problem by proposing algorithms that adapt to changing preference matrices, providing regret bounds and a lower bound, improving upon existing methods in dynamic environments.

Contribution

It introduces the Beat the Winner Reset algorithm with tighter regret bounds and two meta-algorithms, DETECT and Monitored Dueling Bandits, for handling non-stationarity without prior knowledge of segment count or length.

Findings

01

Beat the Winner Reset achieves tighter regret bounds in stationary cases.

02

Meta-algorithms effectively detect changes and adapt without prior knowledge.

03

Theoretical lower bounds establish limits for non-stationary dueling bandits.

Abstract

We study the non-stationary dueling bandits problem with $K$ arms, where the time horizon $T$ consists of $M$ stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the $Beat the Winner Reset$ algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of $M$ or $T$ . We further propose and analyze two meta-algorithms, $DETECT$ for weak regret and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems