Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems
Pan Li, Alexander Tuzhilin

TL;DR
This paper introduces Deep Pareto Reinforcement Learning (DeepPRL), a novel method for multi-objective recommender systems that models complex relationships and personalizes recommendations, outperforming existing methods in offline and real-world tests.
Contribution
DeepPRL systematically models dynamic, heterogeneous relationships between multiple objectives and personalizes recommendations, improving both short-term and long-term performance.
Findings
Achieves significant Pareto-dominance over state-of-the-art baselines.
Improves three conflicting business objectives in Alibaba's video platform.
Demonstrates tangible economic benefits in real-world deployment.
Abstract
Optimizing multiple objectives simultaneously is an important task for recommendation platforms to improve their performance. However, this task is particularly challenging since the relationships between different objectives are heterogeneous across different consumers and dynamically fluctuating according to different contexts. Especially in those cases when objectives become conflicting with each other, the result of recommendations will form a pareto-frontier, where the improvements of any objective comes at the cost of a performance decrease of another objective. Existing multi-objective recommender systems do not systematically consider such dynamic relationships; instead, they balance between these objectives in a static and uniform manner, resulting in only suboptimal multi-objective recommendation performance. In this paper, we propose a Deep Pareto Reinforcement Learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
