Cooperative Multi-Agent Deep Reinforcement Learning in Content Ranking Optimization
Zhou Qin, Kai Yuan, Pratik Lahiri, and Wenyang Liu

TL;DR
This paper introduces a multi-agent reinforcement learning approach for whole page content ranking in e-commerce, optimizing overall revenue rather than individual positions, and demonstrates significant improvements over traditional methods.
Contribution
The paper proposes a novel multi-agent deep reinforcement learning framework for whole page ranking, shifting from position-level to page-level optimization in content ranking systems.
Findings
MADDPG scales to 2.5 billion actions in Mujoco environment.
Outperforms deep bandits by 25.7% on offline e-commerce data.
Supports flexible, scalable joint optimization in content ranking.
Abstract
In a typical e-commerce setting, Content Ranking Optimization (CRO) mechanisms are employed to surface content on the search page to fulfill customers' shopping missions. CRO commonly utilizes models such as contextual deep bandits model to independently rank content at different positions, e.g., one optimizer dedicated to organic search results and another to sponsored results. However, this regional optimization approach does not necessarily translate to whole page optimization, e.g., maximizing revenue at the top of the page may inadvertently diminish the revenue of lower positions. In this paper, we propose a reinforcement learning based method for whole page ranking to jointly optimize across all positions by: 1) shifting from position level optimization to whole page level optimization to achieve an overall optimized ranking; 2) applying reinforcement learning to optimize for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sparse Evolutionary Training · Dense Connections · Batch Normalization · Weight Decay · Experience Replay · Adam · Convolution · MADDPG
