Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary   Dueling Bandits

Aadirupa Saha; Shubham Gupta

arXiv:2111.03917·cs.LG·June 14, 2022

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

Aadirupa Saha, Shubham Gupta

PDF

Open Access

TL;DR

This paper introduces optimal algorithms for dynamic regret minimization in non-stationary dueling bandits, achieving near-optimal guarantees under various non-stationarity measures and validating their effectiveness through simulations.

Contribution

It proposes the first efficient algorithms with provable optimal dynamic regret bounds for non-stationary dueling bandits, addressing both effective-switches and continuous-variation non-stationarities.

Findings

01

Achieved $\tilde{O}(\sqrt{SKT})$ dynamic regret for effective-switches.

02

Achieved $\tilde{O}(V_T^{1/3}K^{1/3}T^{2/3})$ regret for continuous-variation.

03

Validated algorithms through extensive simulations and comparisons.

Abstract

We study the problem of \emph{dynamic regret minimization} in $K$ -armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O (K T)$ high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish $\tO (S K T)$ and $\tO (V_{T}^{1/3} K^{1/3} T^{2/3})$ dynamic-regret guarantees, $S$ being the total number of `effective-switches' in the underlying preference relations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Auction Theory and Applications