Virtual-Taobao: Virtualizing Real-world Online Retail Environment for   Reinforcement Learning

Jing-Cheng Shi; Yang Yu; Qing Da; Shi-Yong Chen; An-Xiang Zeng

arXiv:1805.10000·cs.AI·May 28, 2018·22 cites

Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning

Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng

PDF

Open Access 2 Repos

TL;DR

This paper introduces Virtual Taobao, a simulator built from historical data using GAN-SD and MAIL, enabling reinforcement learning for online retail without physical trial costs, and demonstrates its effectiveness in improving commodity search.

Contribution

The paper presents a novel approach to simulate a complex online retail environment using GAN-SD and MAIL, facilitating reinforcement learning without real-world sampling costs.

Findings

01

Virtual Taobao faithfully replicates key properties of the real environment.

02

Policies trained in Virtual Taobao outperform traditional supervised methods online.

03

The approach reduces physical trial costs in reinforcement learning applications.

Abstract

Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our approach: first we build Virtual Taobao, a simulator learned from historical customer behavior data through the proposed GAN-SD (GAN for Simulating Distributions) and MAIL (multi-agent adversarial imitation learning), and then we train policies in Virtual Taobao with no physical costs in which ANC (Action Norm Constraint) strategy is proposed to reduce over-fitting. In experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Robot Manipulation and Learning