Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, Yinghui Xu

TL;DR
This paper introduces a reinforcement learning framework for multi-step ranking in e-commerce search, formalizing the problem, analyzing its properties, and demonstrating significant improvements over traditional methods in both simulation and real-world TaoBao data.
Contribution
It formalizes the search session ranking as an SSMDP, analyzes its properties, and proposes a novel policy gradient algorithm tailored for this setting.
Findings
Over 40% increase in transaction amount in simulation.
Over 30% increase in transaction amount in TaoBao.
Superior performance compared to online LTR methods.
Abstract
In e-commerce platforms such as Amazon and TaoBao, ranking items in a search session is a typical multi-step decision-making problem. Learning to rank (LTR) methods have been widely applied to ranking problems. However, such methods often consider different ranking steps in a session to be independent, which conversely may be highly correlated to each other. For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session. Firstly, we formally define the concept of search session Markov decision process (SSMDP) to formulate the multi-step ranking problem. Secondly, we analyze the property of SSMDP and theoretically prove the necessity of maximizing accumulative rewards. Lastly, we propose a novel policy gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Bandit Algorithms Research · Auction Theory and Applications
