Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
Yi Wu, Daryl Chang, Jennifer She, Zhe Zhao, Li Wei, Lukasz Heldt

TL;DR
This paper introduces the Learned Ranking Function (LRF), a system that directly optimizes long-term user satisfaction by transforming short-term behavior predictions into optimized recommendation slates, demonstrated on YouTube.
Contribution
It proposes a novel slate optimization approach with a constraint algorithm for multi-objective trade-offs, improving long-term user satisfaction over heuristic methods.
Findings
Successful deployment on YouTube
Live experiments show improved user satisfaction
Novel constraint optimization algorithm
Abstract
We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations that directly optimizes for long-term user satisfaction. Most previous work is based on optimizing the hyperparameters of a heuristic function. We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction. We also develop a novel constraint optimization algorithm that stabilizes objective trade-offs for multi-objective optimization. We evaluate our approach with live experiments and describe its deployment on YouTube.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics
