Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives
S Akash, Pratik Gajane, Jawar Singh

TL;DR
This paper introduces the first unified algorithms for multi-dueling bandits that perform optimally in both stochastic and adversarial environments under Condorcet and Borda objectives, without prior knowledge of the environment.
Contribution
It proposes MetaDueling and SA-MiDEX algorithms that achieve best-of-both-worlds regret bounds for multi-dueling bandits under two different preference models.
Findings
MetaDueling converts any dueling bandit algorithm into a multi-dueling bandit algorithm.
The combined algorithms achieve near-optimal regret bounds in both stochastic and adversarial settings.
Matching lower bounds are provided for the Condorcet setting, confirming near-optimality.
Abstract
Multi-dueling bandits, where a learner selects arms per round and observes only the winner, arise naturally in many applications including ranking and recommendation systems, yet a fundamental question has remained open: can a single algorithm perform optimally in both stochastic and adversarial environments, without knowing which regime it faces? We answer this affirmatively, providing the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives. For the Condorcet setting, we propose , a black-box reduction that converts any dueling bandit algorithm into a multi-dueling bandit algorithm by transforming multi-way winner feedback into an unbiased pairwise signal. Instantiating our reduction with yields the first best-of-both-worlds algorithm for multi-dueling bandits: it achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques
