Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

Imad Aouali; Achraf Ait Sidi Hammou; Otmane Sakhi; David Rohde,; Flavian Vasile

arXiv:2208.06263·cs.IR·July 8, 2024·1 cites

Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation

Imad Aouali, Achraf Ait Sidi Hammou, Otmane Sakhi, David Rohde,, Flavian Vasile

PDF

Open Access

TL;DR

This paper presents PRR, a scalable probabilistic model for personalized slate recommendation that efficiently estimates rewards and outperforms existing methods, suitable for low latency applications like advertising.

Contribution

Introduction of PRR, a novel probabilistic model enabling scalable off-policy reward estimation for slate recommendation systems.

Findings

01

PRR outperforms existing off-policy reward methods.

02

PRR is highly scalable to large action spaces.

03

PRR enables fast recommendations using maximum inner product search.

Abstract

We introduce Probabilistic Rank and Reward (PRR), a scalable probabilistic model for personalized slate recommendation. Our approach allows off-policy estimation of the reward in the scenario where the user interacts with at most one item from a slate of K items. We show that the probability of a slate being successful can be learned efficiently by combining the reward, whether the user successfully interacted with the slate, and the rank, the item that was selected within the slate. PRR outperforms existing off-policy reward optimizing methods and is far more scalable to large action spaces. Moreover, PRR allows fast delivery of recommendations powered by maximum inner product search (MIPS), making it suitable in low latency domains such as computational advertising.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings