Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Aadirupa Saha, Shubham Gupta

TL;DR
This paper introduces optimal algorithms for dynamic regret minimization in non-stationary dueling bandits, achieving near-optimal guarantees under various non-stationarity measures and validating their effectiveness through simulations.
Contribution
It proposes the first efficient algorithms with provable optimal dynamic regret bounds for non-stationary dueling bandits, addressing both effective-switches and continuous-variation non-stationarities.
Findings
Achieved $\tilde{O}(\sqrt{SKT})$ dynamic regret for effective-switches.
Achieved $\tilde{O}(V_T^{1/3}K^{1/3}T^{2/3})$ regret for continuous-variation.
Validated algorithms through extensive simulations and comparisons.
Abstract
We study the problem of \emph{dynamic regret minimization} in -armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish and dynamic-regret guarantees, being the total number of `effective-switches' in the underlying preference relations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Auction Theory and Applications
