Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation
Imad Aouali, Achraf Ait Sidi Hammou, Otmane Sakhi, David Rohde,, Flavian Vasile

TL;DR
This paper presents PRR, a scalable probabilistic model for personalized slate recommendation that efficiently estimates rewards and outperforms existing methods, suitable for low latency applications like advertising.
Contribution
Introduction of PRR, a novel probabilistic model enabling scalable off-policy reward estimation for slate recommendation systems.
Findings
PRR outperforms existing off-policy reward methods.
PRR is highly scalable to large action spaces.
PRR enables fast recommendations using maximum inner product search.
Abstract
We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms existing off-policy reward optimizing methods and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
