TL;DR
This study reproduces and extends previous research on state encoders in reinforcement learning for recommendation systems, revealing that prior conclusions about the best encoder type do not generalize across different datasets, methods, and debiased environments.
Contribution
It provides a comprehensive reproducibility analysis of state encoders in RL4Rec, testing across new datasets, RL methods, and debiased simulators, challenging previous findings.
Findings
Attention-based encoder not always optimal
Results vary with dataset and RL method
Prior conclusions do not generalize broadly
Abstract
Methods for reinforcement learning for recommendation (RL4Rec) are increasingly receiving attention as they can quickly adapt to user feedback. A typical RL4Rec framework consists of (1) a state encoder to encode the state that stores the users' historical interactions, and (2) an RL method to take actions and observe rewards. Prior work compared four state encoders in an environment where user feedback is simulated based on real-world logged user data. An attention-based state encoder was found to be the optimal choice as it reached the highest performance. However, this finding is limited to the actor-critic method, four state encoders, and evaluation-simulators that do not debias logged user data. In response to these shortcomings, we reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
