TL;DR
This paper introduces a novel reinforcement learning framework for recommender systems that decomposes the large action space into hyper-actions and effect-actions, improving recommendation performance and stability.
Contribution
It proposes a hyper-actor and critic framework with an alignment and supervision module to regulate the latent action space in recommendation tasks.
Findings
Outperforms standard RL baselines in simulated environments
Effective in stabilizing learning process
Improves recommendation accuracy with large action spaces
Abstract
In recommender systems, reinforcement learning solutions have effectively boosted recommendation performance because of their ability to capture long-term user-system interaction. However, the action space of the recommendation policy is a list of items, which could be extremely large with a dynamic candidate item pool. To overcome this challenge, we propose a hyper-actor and critic learning framework where the policy decomposes the item list generation process into a hyper-action inference step and an effect-action selection step. The first step maps the given state space into a vectorized hyper-action space, and the second step selects the item list based on the hyper-action. In order to regulate the discrepancy between the two action spaces, we design an alignment module along with a kernel mapping function for items to ensure inference accuracy and include a supervision module to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
