Reinforcement Learning to Optimize Long-term User Engagement in   Recommender Systems

Lixin Zou; Long Xia; Zhuoye Ding; Jiaxing Song; Weidong Liu; Dawei Yin

arXiv:1902.05570·cs.IR·July 12, 2019·21 cites

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, Dawei Yin

PDF

Open Access

TL;DR

This paper introduces FeedRec, a reinforcement learning framework designed to optimize long-term user engagement in feed streaming recommender systems by modeling complex behaviors and simulating the environment.

Contribution

The paper presents a novel RL framework with hierarchical LSTM-based Q-Network and environment simulation S-Network for long-term engagement optimization.

Findings

01

FeedRec outperforms state-of-the-art methods on synthetic and real data.

02

It effectively models complex user behaviors including instant and delayed feedback.

03

The approach improves long-term user engagement metrics.

Abstract

Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such an interactive manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics, and typically measured by {\bf long-term user engagement}. Directly optimizing the long-term user engagement is a non-trivial problem, as the learning target is usually not available for conventional supervised learning methods. Though reinforcement learning~(RL) naturally fits the problem of maximizing the long term rewards, applying RL to optimize long-term user engagement is still facing challenges: user behaviors are versatile and difficult to model, which typically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory