Adversarial Multi-dueling Bandits

Pratik Gajane

arXiv:2406.12475·cs.LG·June 27, 2024

Adversarial Multi-dueling Bandits

Pratik Gajane

PDF

Open Access

TL;DR

This paper introduces the adversarial multi-dueling bandits problem, proposing a novel algorithm MiDEX with near-optimal regret bounds for selecting the most preferred arm in adversarial settings.

Contribution

The paper formulates the adversarial multi-dueling bandits problem, introduces the MiDEX algorithm, and provides theoretical regret bounds demonstrating near-optimal performance.

Findings

01

MiDEX achieves an expected regret of O((K log K)^{1/3} T^{2/3}).

02

A matching lower bound of Ω(K^{1/3} T^{2/3}) shows near-optimality.

03

The problem setting extends dueling bandits to adversarial preferences.

Abstract

We introduce the problem of regret minimization in adversarial multi-dueling bandits. While adversarial preferences have been studied in dueling bandits, they have not been explored in multi-dueling bandits. In this setting, the learner is required to select $m \geq 2$ arms at each round and observes as feedback the identity of the most preferred arm which is based on an arbitrary preference matrix chosen obliviously. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback that is assumed to be generated from a pairwise-subset choice model. We prove that the expected cumulative $T$ -round regret of MiDEX compared to a Borda-winner from a set of $K$ arms is upper bounded by $O ((K lo g K)^{1/3} T^{2/3})$ . Moreover, we prove a lower bound of $Ω (K^{1/3} T^{2/3})$ for the expected regret in this setting which demonstrates that our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Security in Wireless Sensor Networks

MethodsSparse Evolutionary Training