A General Offline Reinforcement Learning Framework for Interactive Recommendation
Teng Xiao, Donglin Wang

TL;DR
This paper introduces a comprehensive offline reinforcement learning framework for interactive recommendation systems, enabling maximization of user rewards without online exploration, through probabilistic modeling and distribution mismatch mitigation.
Contribution
The paper presents a novel offline RL framework with five strategies to reduce distribution mismatch, validated by extensive experiments on real-world datasets.
Findings
Proposed methods outperform existing supervised and RL approaches.
Effective in reducing distribution mismatch between logging and recommendation policies.
Achieves superior recommendation performance on real-world datasets.
Abstract
This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Smart Grid Energy Management
