Sequential Search with Off-Policy Reinforcement Learning
Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun, Xiao, Lingfei Wu, Yunjiang Jiang

TL;DR
This paper introduces a scalable hybrid model for sequential search that combines RNNs and attention mechanisms, utilizing off-policy reinforcement learning to improve multi-session personalized ranking in e-commerce.
Contribution
It proposes a novel hybrid learning framework with an efficient training method and applies off-policy reinforcement learning for enhanced personalized search ranking.
Findings
Significant improvements over baseline models on offline metrics.
Effective use of off-policy RL in multi-session search ranking.
Enhanced long-term user reward modeling.
Abstract
Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
