Non-Stationary Dueling Bandits
Patrick Kolpaczki, Viktor Bengs, Eyke H\"ullermeier

TL;DR
This paper addresses the non-stationary dueling bandits problem by proposing algorithms that adapt to changing preference matrices, providing regret bounds and a lower bound, improving upon existing methods in dynamic environments.
Contribution
It introduces the Beat the Winner Reset algorithm with tighter regret bounds and two meta-algorithms, DETECT and Monitored Dueling Bandits, for handling non-stationarity without prior knowledge of segment count or length.
Findings
Beat the Winner Reset achieves tighter regret bounds in stationary cases.
Meta-algorithms effectively detect changes and adapt without prior knowledge.
Theoretical lower bounds establish limits for non-stationary dueling bandits.
Abstract
We study the non-stationary dueling bandits problem with arms, where the time horizon consists of stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of or . We further propose and analyze two meta-algorithms, for weak regret and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
