When Can We Track Significant Preference Shifts in Dueling Bandits?
Joe Suk, Arpit Agarwal

TL;DR
This paper investigates the ability to detect significant shifts in user preferences over time within dueling bandits, revealing that such adaptive algorithms are feasible only under certain preference distribution classes.
Contribution
It provides the first analysis of dynamic regret bounds for dueling bandits with distribution shifts, identifying classes where such bounds are achievable or impossible.
Findings
Impossibility of $O( oot{K} ilde{L}T)$ regret under Condorcet and SST classes.
Feasibility of such regret bounds within the SST ∩ STI class.
Almost complete characterization of preference classes for adaptive dueling bandits.
Abstract
The -armed dueling bandits problem, where the feedback is in the form of noisy pairwise preferences, has been widely studied due its applications in information retrieval, recommendation systems, etc. Motivated by concerns that user preferences/tastes can evolve over time, we consider the problem of dueling bandits with distribution shifts. Specifically, we study the recent notion of significant shifts (Suk and Kpotufe, 2022), and ask whether one can design an adaptive algorithm for the dueling problem with dynamic regret, where is the (unknown) number of significant shifts in preferences. We show that the answer to this question depends on the properties of underlying preference distributions. Firstly, we give an impossibility result that rules out any algorithm with dynamic regret under the well-studied Condorcet and SST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
