Online Clustering of Dueling Bandits
Zhiyong Wang, Jiahang Sun, Mingze Kong, Jize Xie, Qinghua Hu, John, C.S. Lui, Zhongxiang Dai

TL;DR
This paper introduces novel algorithms for collaborative decision-making in dueling bandit problems using preference feedback, with theoretical guarantees and empirical validation for improved user clustering and regret bounds.
Contribution
It presents the first clustering algorithms for dueling bandits that operate with preference feedback, including linear and neural models, with theoretical analysis and empirical results.
Findings
Algorithms outperform non-collaborative methods in synthetic datasets.
Theoretical regret bounds demonstrate improved performance with user collaboration.
Empirical results on real-world data validate the effectiveness of the proposed methods.
Abstract
The contextual multi-armed bandit (MAB) is a widely used framework for problems requiring sequential decision-making under uncertainty, such as recommendation systems. In applications involving a large number of users, the performance of contextual MAB can be significantly improved by facilitating collaboration among multiple users. This has been achieved by the clustering of bandits (CB) methods, which adaptively group the users into different clusters and achieve collaboration by allowing the users in the same cluster to share data. However, classical CB algorithms typically rely on numerical reward feedback, which may not be practical in certain real-world applications. For instance, in recommendation systems, it is more realistic and reliable to solicit preference feedback between pairs of recommended items rather than absolute rewards. To address this limitation, we introduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research · Personal Information Management and User Behavior
