MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for   Traffic Allocation in E-Commerce Search

Peng Cheng; Huimu Wang; Jinyuan Zhao; Yihao Wang; Enqiang Xu; Yu Zhao,; Zhuojian Xiao; Songlin Wang; Guoyu Tang; Lin Liu; Sulong Xu

arXiv:2407.15476·cs.LG·July 23, 2024

MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

Peng Cheng, Huimu Wang, Jinyuan Zhao, Yihao Wang, Enqiang Xu, Yu Zhao,, Zhuojian Xiao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu

PDF

Open Access

TL;DR

This paper introduces MODRL-TA, a multi-objective deep reinforcement learning framework designed to optimize traffic allocation in e-commerce search, effectively balancing multiple objectives and addressing cold start issues through innovative ensemble, decision fusion, and data augmentation techniques.

Contribution

The paper presents a novel multi-objective deep reinforcement learning framework with ensemble models, dynamic objective weighting, and progressive data augmentation for traffic allocation in e-commerce search.

Findings

01

Significant improvements in traffic allocation performance on real-world systems.

02

Successful deployment of MODRL-TA on an e-commerce search platform.

03

Effective handling of cold start and distributional shift issues.

Abstract

Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Platforms and Economics · Peer-to-Peer Network Technologies · Consumer Market Behavior and Pricing

MethodsQ-Learning