Dueling Bandits: From Two-dueling to Multi-dueling
Yihan Du, Siwei Wang, Longbo Huang

TL;DR
This paper introduces new algorithms for the multi-dueling bandit problem, extending the traditional two-dueling setting to multiple options, with theoretical guarantees and empirical validation showing improved regret bounds and performance.
Contribution
It proposes the DoublerBAI, MultiSBM-Feedback, and MultiRUCB algorithms, providing regret bounds and finite-time analysis for the multi-dueling bandit problem, a significant generalization of the classic two-dueling setting.
Findings
Algorithms achieve $O( ext{ln } T)$ regret bounds.
MultiSBM-Feedback reduces constant factors compared to benchmarks.
Empirical results show outperforming existing algorithms on synthetic and real data.
Abstract
We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms. This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options. We start with the two-dueling bandit setting and propose two efficient algorithms, DoublerBAI and MultiSBM-Feedback. DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of . MultiSBM-Feedback not only has an optimal regret, but also reduces the constant factor by almost a half compared to benchmark results. Then, we consider the general multi-dueling case and develop an efficient algorithm MultiRUCB. Using a novel finite-time regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
