Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep   Reinforcement Learning based Recommendation

Siyu Wang; Xiaocong Chen; Lina Yao; Sally Cripps; Julian; McAuley

arXiv:2208.05142·cs.IR·November 2, 2023·1 cites

Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning based Recommendation

Siyu Wang, Xiaocong Chen, Lina Yao, Sally Cripps, Julian, McAuley

PDF

Open Access

TL;DR

This paper introduces a model-agnostic counterfactual synthesis policy that augments user interaction data in reinforcement learning-based recommender systems, enhancing their performance and adaptability.

Contribution

It proposes a novel counterfactual data augmentation method using a general, model-agnostic policy trained with expert demonstrations and joint training, improving RL recommender systems.

Findings

01

Improves recommendation performance in online simulations

02

Enhances generalization on offline datasets

03

Effective across multiple RL algorithms

Abstract

Recent advances in recommender systems have proved the potential of Reinforcement Learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general Model-Agnostic Counterfactual Synthesis (MACS) Policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesise counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Smart Grid Energy Management

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Adam · Weight Decay · Convolution · Experience Replay · Dense Connections · Deep Deterministic Policy Gradient · Soft Actor Critic