MBCAL: Sample Efficient and Variance Reduced Reinforcement Learning for   Recommender Systems

Fan Wang; Xiaomin Fang; Lihang Liu; Hao Tian; Zhiming Peng

arXiv:1911.02248·cs.IR·June 19, 2020

MBCAL: Sample Efficient and Variance Reduced Reinforcement Learning for Recommender Systems

Fan Wang, Xiaomin Fang, Lihang Liu, Hao Tian, Zhiming Peng

PDF

Open Access

TL;DR

This paper introduces MBCAL, a model-based reinforcement learning approach for recommender systems that improves sample efficiency and reduces variance by using environment modeling and counterfactual comparisons, enabling safer and more effective long-term utility optimization.

Contribution

The paper proposes MBCAL, a novel model-based RL method tailored for recommender systems, incorporating environment and advantage models with counterfactuals to enhance efficiency and stability.

Findings

01

MBCAL achieves higher sample efficiency than existing methods.

02

It significantly reduces variance in learning the future advantage.

03

Experimental results outperform supervised and RL-based baselines.

Abstract

In recommender systems such as news feed stream, it is essential to optimize the long-term utilities in the continuous user-system interaction processes. Previous works have proved the capability of reinforcement learning in this problem. However, there are many practical challenges to implement deep reinforcement learning in online systems, including low sample efficiency, uncontrollable risks, and excessive variances. To address these issues, we propose a novel reinforcement learning method, namely model-based counterfactual advantage learning (MBCAL). The proposed method takes advantage of the characteristics of recommender systems and draws ideas from the model-based reinforcement learning method for higher sample efficiency. It has two components: an environment model that predicts the instant user behavior one-by-one in an auto-regressive form, and a future advantage model that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Data Stream Mining Techniques