Supervised Learning-enhanced Multi-Group Actor Critic for Live Stream Allocation in Feed
Jingxin Liu, Xiang Gao, Yisha Li, Xin Li, Haiyang Lu, Ben Wang

TL;DR
This paper introduces a novel reinforcement learning algorithm, SL-MGAC, that improves live stream recommendation by enhancing stability and reducing variance, leading to better long-term user engagement in feed systems.
Contribution
The paper proposes a supervised learning-enhanced multi-group actor-critic algorithm with variance reduction and a new reward function for stable, effective live stream allocation in recommendation systems.
Findings
Outperforms baseline methods in offline evaluations.
Demonstrates improved stability in online A/B tests.
Effectively balances long-term engagement and allocation greediness.
Abstract
In the context of a short video & live stream mixed recommendation scenario, the live stream recommendation system (RS) decides whether to allocate at most one live stream into the video feed for each user request. To maximize long-term user engagement, it is crucial to determine an optimal live stream policy for accurate live stream allocation. The inappropriate live stream allocation policy can significantly affect the duration of the usage app and user retention, which ignores the long-term negative impact of live stream allocation. Recently, reinforcement learning (RL) has been widely applied in recommendation systems to capture long-term user engagement. However, traditional RL algorithms often face divergence and instability problems, which restricts the application and deployment in the large-scale industrial recommendation systems, especially in the aforementioned challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research
