Low-rank Bandits with Latent Mixtures
Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

TL;DR
This paper introduces a bandit algorithm for recommender systems that models users as latent mixtures of classes, leveraging low-rank structures and tensor methods to achieve regret bounds in dynamic user interactions.
Contribution
It develops a novel algorithm combining tensor power methods with linear bandits, providing the first rigorous regret analysis for low-rank bandits with latent user mixtures.
Findings
Regret after T interactions is $ ilde O(C\sqrt{BT})$.
Algorithm effectively handles two-sided uncertainty in user and item features.
Provides a new robustness property of OFUL for low-rank bandit problems.
Abstract
We study the task of maximizing rewards from recommending items (actions) to users sequentially interacting with a recommender system. Users are modeled as latent mixtures of C many representative user classes, where each class specifies a mean reward profile across actions. Both the user features (mixture distribution over classes) and the item features (mean reward vector per class) are unknown a priori. The user identity is the only contextual information available to the learner while interacting. This induces a low-rank structure on the matrix of expected rewards r a,b from recommending item a to user b. The problem reduces to the well-known linear bandit when either user or item-side features are perfectly known. In the setting where each user, with its stochastically sampled taste profile, interacts only for a small number of sessions, we develop a bandit algorithm for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management
