Multi-User Contextual Cascading Bandits for Personalized Recommendation

Jiho Park; Huiwen Jia

arXiv:2508.13981·cs.LG·August 26, 2025

Multi-User Contextual Cascading Bandits for Personalized Recommendation

Jiho Park, Huiwen Jia

PDF

TL;DR

This paper introduces a new multi-user bandit model for personalized recommendations, proposing algorithms with provable regret bounds and demonstrating their effectiveness through experiments.

Contribution

It develops the Multi-User Contextual Cascading Bandit model and proposes two algorithms with theoretical guarantees, addressing simultaneous multi-user interactions in recommendation systems.

Findings

01

UCBBP achieves regret of O(\

02

(\

03

(\

Abstract

We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $O (T H N)$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.