Reinforcement Re-ranking with 2D Grid-based Recommendation Panels
Sirui Chen, Xiao Zhang, Xu Chen, Zhiyu Li, Yuan Wang, Quan Lin, Jun, Xu

TL;DR
This paper introduces Panel-MDP, a reinforcement learning approach for effectively re-ranking items into 2D grid-based panels in recommender systems, addressing the challenge of non-sequential slot arrangements.
Contribution
The paper proposes a novel Markov decision process model for 2D grid-based recommendation re-ranking, improving user experience over traditional list-based methods.
Findings
Panel-MDP outperforms baseline models in grid panel recommendation quality.
Reinforcement learning with PPO effectively learns item placement strategies.
Simulation results demonstrate significant improvements in user engagement metrics.
Abstract
Modern recommender systems usually present items as a streaming, one-dimensional ranking list. Recently there is a trend in e-commerce that the recommended items are organized grid-based panels with two dimensions where users can view the items in both vertical and horizontal directions. Presenting items in grid-based result panels poses new challenges to recommender systems because existing models are all designed to output sequential lists while the slots in a grid-based panel have no explicit order. Directly converting the item rankings into grids (e.g., pre-defining an order on the slots) overlooks the user-specific behavioral patterns on grid-based panels and inevitably hurts the user experiences. To address this issue, we propose a novel Markov decision process (MDP) to place the items in 2D grid-based result panels at the final re-ranking stage of the recommender systems. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network
