
TL;DR
This paper introduces the adversarial multi-dueling bandits problem, proposing a novel algorithm MiDEX with near-optimal regret bounds for selecting the most preferred arm in adversarial settings.
Contribution
The paper formulates the adversarial multi-dueling bandits problem, introduces the MiDEX algorithm, and provides theoretical regret bounds demonstrating near-optimal performance.
Findings
MiDEX achieves an expected regret of O((K log K)^{1/3} T^{2/3}).
A matching lower bound of Ω(K^{1/3} T^{2/3}) shows near-optimality.
The problem setting extends dueling bandits to adversarial preferences.
Abstract
We introduce the problem of regret minimization in adversarial multi-dueling bandits. While adversarial preferences have been studied in dueling bandits, they have not been explored in multi-dueling bandits. In this setting, the learner is required to select arms at each round and observes as feedback the identity of the most preferred arm which is based on an arbitrary preference matrix chosen obliviously. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback that is assumed to be generated from a pairwise-subset choice model. We prove that the expected cumulative -round regret of MiDEX compared to a Borda-winner from a set of arms is upper bounded by . Moreover, we prove a lower bound of for the expected regret in this setting which demonstrates that our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Security in Wireless Sensor Networks
MethodsSparse Evolutionary Training
