Sequential Search with Off-Policy Reinforcement Learning

Dadong Miao; Yanan Wang; Guoyu Tang; Lin Liu; Sulong Xu; Bo Long; Yun; Xiao; Lingfei Wu; Yunjiang Jiang

arXiv:2202.00245·cs.IR·February 2, 2022

Sequential Search with Off-Policy Reinforcement Learning

Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun, Xiao, Lingfei Wu, Yunjiang Jiang

PDF

TL;DR

This paper introduces a scalable hybrid model for sequential search that combines RNNs and attention mechanisms, utilizing off-policy reinforcement learning to improve multi-session personalized ranking in e-commerce.

Contribution

It proposes a novel hybrid learning framework with an efficient training method and applies off-policy reinforcement learning for enhanced personalized search ranking.

Findings

01

Significant improvements over baseline models on offline metrics.

02

Effective use of off-policy RL in multi-session search ranking.

03

Enhanced long-term user reward modeling.

Abstract

Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.