Reinforcement Learning to Rank in E-Commerce Search Engine:   Formalization, Analysis, and Application

Yujing Hu; Qing Da; Anxiang Zeng; Yang Yu; Yinghui Xu

arXiv:1803.00710·cs.LG·May 24, 2018·38 cites

Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, Yinghui Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning framework for multi-step ranking in e-commerce search, formalizing the problem, analyzing its properties, and demonstrating significant improvements over traditional methods in both simulation and real-world TaoBao data.

Contribution

It formalizes the search session ranking as an SSMDP, analyzes its properties, and proposes a novel policy gradient algorithm tailored for this setting.

Findings

01

Over 40% increase in transaction amount in simulation.

02

Over 30% increase in transaction amount in TaoBao.

03

Superior performance compared to online LTR methods.

Abstract

In e-commerce platforms such as Amazon and TaoBao, ranking items in a search session is a typical multi-step decision-making problem. Learning to rank (LTR) methods have been widely applied to ranking problems. However, such methods often consider different ranking steps in a session to be independent, which conversely may be highly correlated to each other. For better utilizing the correlation between different ranking steps, in this paper, we propose to use reinforcement learning (RL) to learn an optimal ranking policy which maximizes the expected accumulative rewards in a search session. Firstly, we formally define the concept of search session Markov decision process (SSMDP) to formulate the multi-step ranking problem. Secondly, we analyze the property of SSMDP and theoretically prove the necessity of maximizing accumulative rewards. Lastly, we propose a novel policy gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UnibucProjects/DeepRLRecommenderSystem
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Advanced Bandit Algorithms Research · Auction Theory and Applications