Online Clustering of Bandits with Misspecified User Models
Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, John C.S. Lui

TL;DR
This paper introduces robust clustering algorithms for contextual linear bandits that handle misspecified user models, providing theoretical regret bounds and demonstrating improved performance over existing methods.
Contribution
First to address clustering of bandits with misspecified user models, proposing two algorithms with proven regret bounds under milder assumptions.
Findings
Regret bounds match lower bounds asymptotically in T.
Algorithms outperform previous methods on synthetic and real data.
Handles model misspecification effectively.
Abstract
The contextual linear bandit is an important online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run. A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms. However, existing CB algorithms require well-specified linear user models and can fail when this critical assumption does not hold. Whether robust CB algorithms can be designed for more practical scenarios with misspecified user models remains an open problem. In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM), where the expected rewards in user models can be perturbed away from perfect linear models. We devise two robust CB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Recommender Systems and Techniques
