Plug-and-Play Model-Agnostic Counterfactual Policy Synthesis for Deep Reinforcement Learning based Recommendation
Siyu Wang, Xiaocong Chen, Lina Yao, Sally Cripps, Julian, McAuley

TL;DR
This paper introduces a model-agnostic counterfactual synthesis policy that augments user interaction data in reinforcement learning-based recommender systems, enhancing their performance and adaptability.
Contribution
It proposes a novel counterfactual data augmentation method using a general, model-agnostic policy trained with expert demonstrations and joint training, improving RL recommender systems.
Findings
Improves recommendation performance in online simulations
Enhances generalization on offline datasets
Effective across multiple RL algorithms
Abstract
Recent advances in recommender systems have proved the potential of Reinforcement Learning (RL) to handle the dynamic evolution processes between users and recommender systems. However, learning to train an optimal RL agent is generally impractical with commonly sparse user feedback data in the context of recommender systems. To circumvent the lack of interaction of current RL-based recommender systems, we propose to learn a general Model-Agnostic Counterfactual Synthesis (MACS) Policy for counterfactual user interaction data augmentation. The counterfactual synthesis policy aims to synthesise counterfactual states while preserving significant information in the original state relevant to the user's interests, building upon two different training approaches we designed: learning with expert demonstrations and joint training. As a result, the synthesis of each counterfactual data is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Smart Grid Energy Management
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Adam · Weight Decay · Convolution · Experience Replay · Dense Connections · Deep Deterministic Policy Gradient · Soft Actor Critic
