A General Offline Reinforcement Learning Framework for Interactive   Recommendation

Teng Xiao; Donglin Wang

arXiv:2310.00678·cs.LG·October 3, 2023·2 cites

A General Offline Reinforcement Learning Framework for Interactive Recommendation

Teng Xiao, Donglin Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a comprehensive offline reinforcement learning framework for interactive recommendation systems, enabling maximization of user rewards without online exploration, through probabilistic modeling and distribution mismatch mitigation.

Contribution

The paper presents a novel offline RL framework with five strategies to reduce distribution mismatch, validated by extensive experiments on real-world datasets.

Findings

01

Proposed methods outperform existing supervised and RL approaches.

02

Effective in reducing distribution mismatch between logging and recommendation policies.

03

Achieves superior recommendation performance on real-world datasets.

Abstract

This paper studies the problem of learning interactive recommender systems from logged feedbacks without any exploration in online environments. We address the problem by proposing a general offline reinforcement learning framework for recommendation, which enables maximizing cumulative user rewards without online exploration. Specifically, we first introduce a probabilistic generative model for interactive recommendation, and then propose an effective inference algorithm for discrete and stochastic policy learning based on logged feedbacks. In order to perform offline learning more effectively, we propose five approaches to minimize the distribution mismatch between the logging policy and recommendation policy: support constraints, supervised regularization, policy constraints, dual constraints and reward extrapolation. We conduct extensive experiments on two public real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A General Offline Reinforcement Learning Framework for Interactive Recommendation· underline

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Smart Grid Energy Management