Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives

S Akash; Pratik Gajane; Jawar Singh

arXiv:2603.18972·cs.LG·May 19, 2026

Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives

S Akash, Pratik Gajane, Jawar Singh

PDF

TL;DR

This paper introduces the first unified algorithms for multi-dueling bandits that perform optimally in both stochastic and adversarial environments under Condorcet and Borda objectives, without prior knowledge of the environment.

Contribution

It proposes MetaDueling and SA-MiDEX algorithms that achieve best-of-both-worlds regret bounds for multi-dueling bandits under two different preference models.

Findings

01

MetaDueling converts any dueling bandit algorithm into a multi-dueling bandit algorithm.

02

The combined algorithms achieve near-optimal regret bounds in both stochastic and adversarial settings.

03

Matching lower bounds are provided for the Condorcet setting, confirming near-optimality.

Abstract

Multi-dueling bandits, where a learner selects $m \geq 2$ arms per round and observes only the winner, arise naturally in many applications including ranking and recommendation systems, yet a fundamental question has remained open: can a single algorithm perform optimally in both stochastic and adversarial environments, without knowing which regime it faces? We answer this affirmatively, providing the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives. For the Condorcet setting, we propose $MetaDueling$ , a black-box reduction that converts any dueling bandit algorithm into a multi-dueling bandit algorithm by transforming multi-way winner feedback into an unbiased pairwise signal. Instantiating our reduction with $Versatile-DB$ yields the first best-of-both-worlds algorithm for multi-dueling bandits: it achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques