Learned Ranking Function: From Short-term Behavior Predictions to   Long-term User Satisfaction

Yi Wu; Daryl Chang; Jennifer She; Zhe Zhao; Li Wei; Lukasz Heldt

arXiv:2408.06512·cs.LG·August 14, 2024

Learned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction

Yi Wu, Daryl Chang, Jennifer She, Zhe Zhao, Li Wei, Lukasz Heldt

PDF

Open Access

TL;DR

This paper introduces the Learned Ranking Function (LRF), a system that directly optimizes long-term user satisfaction by transforming short-term behavior predictions into optimized recommendation slates, demonstrated on YouTube.

Contribution

It proposes a novel slate optimization approach with a constraint algorithm for multi-objective trade-offs, improving long-term user satisfaction over heuristic methods.

Findings

01

Successful deployment on YouTube

02

Live experiments show improved user satisfaction

03

Novel constraint optimization algorithm

Abstract

We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations that directly optimizes for long-term user satisfaction. Most previous work is based on optimizing the hyperparameters of a heuristic function. We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction. We also develop a novel constraint optimization algorithm that stabilizes objective trade-offs for multi-objective optimization. We evaluate our approach with live experiments and describe its deployment on YouTube.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics