Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems
Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin

TL;DR
This paper introduces FeedRec, a reinforcement learning framework designed to optimize long-term user engagement in feed streaming recommender systems by modeling complex behaviors and simulating the environment.
Contribution
The paper presents a novel RL framework with hierarchical LSTM-based Q-Network and environment simulation S-Network for long-term engagement optimization.
Findings
FeedRec outperforms state-of-the-art methods on synthetic and real data.
It effectively models complex user behaviors including instant and delayed feedback.
The approach improves long-term user engagement metrics.
Abstract
Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such an interactive manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics, and typically measured by {\bf long-term user engagement}. Directly optimizing the long-term user engagement is a non-trivial problem, as the learning target is usually not available for conventional supervised learning methods. Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
