Multi-User Contextual Cascading Bandits for Personalized Recommendation
Jiho Park, Huiwen Jia

TL;DR
This paper introduces a new multi-user bandit model for personalized recommendations, proposing algorithms with provable regret bounds and demonstrating their effectiveness through experiments.
Contribution
It develops the Multi-User Contextual Cascading Bandit model and proposes two algorithms with theoretical guarantees, addressing simultaneous multi-user interactions in recommendation systems.
Findings
UCBBP achieves regret of O(\
(\
(\
Abstract
We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of over episodes, session steps, and contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
