Constrained Reinforcement Learning for Short Video Recommendation
Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding,, Pinghua Gong, Dong Zheng, Peng Jiang

TL;DR
This paper introduces a constrained reinforcement learning framework for short video recommendation that balances optimizing user watch time with auxiliary interactions, demonstrating superior performance in simulations and live platform deployment.
Contribution
It proposes a novel two-stage actor-critic reinforcement learning method for constrained MDPs tailored to short video recommendation, effectively balancing multiple user response objectives.
Findings
Outperforms baseline methods in simulations for main and auxiliary objectives.
Significantly improves user watch time and interactions in live experiments.
Successfully deployed in a production system for real-world short video recommendations.
Abstract
The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to meet this new demand of optimizing comprehensive user experiences. In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos. To solve the constrained MDP, we propose a two-stage reinforcement learning approach based on actor-critic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Digital Games and Media
