Dueling Bandits: From Two-dueling to Multi-dueling

Yihan Du; Siwei Wang; Longbo Huang

arXiv:2211.10293·cs.LG·November 21, 2022·1 cites

Dueling Bandits: From Two-dueling to Multi-dueling

Yihan Du, Siwei Wang, Longbo Huang

PDF

Open Access

TL;DR

This paper introduces new algorithms for the multi-dueling bandit problem, extending the traditional two-dueling setting to multiple options, with theoretical guarantees and empirical validation showing improved regret bounds and performance.

Contribution

It proposes the DoublerBAI, MultiSBM-Feedback, and MultiRUCB algorithms, providing regret bounds and finite-time analysis for the multi-dueling bandit problem, a significant generalization of the classic two-dueling setting.

Findings

01

Algorithms achieve $O( ext{ln } T)$ regret bounds.

02

MultiSBM-Feedback reduces constant factors compared to benchmarks.

03

Empirical results show outperforming existing algorithms on synthetic and real data.

Abstract

We study a general multi-dueling bandit problem, where an agent compares multiple options simultaneously and aims to minimize the regret due to selecting suboptimal arms. This setting generalizes the traditional two-dueling bandit problem and finds many real-world applications involving subjective feedback on multiple options. We start with the two-dueling bandit setting and propose two efficient algorithms, DoublerBAI and MultiSBM-Feedback. DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O (ln T)$ . MultiSBM-Feedback not only has an optimal $O (ln T)$ regret, but also reduces the constant factor by almost a half compared to benchmark results. Then, we consider the general multi-dueling case and develop an efficient algorithm MultiRUCB. Using a novel finite-time regret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning