Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning
Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, An-Xiang Zeng

TL;DR
This paper introduces Virtual Taobao, a simulator built from historical data using GAN-SD and MAIL, enabling reinforcement learning for online retail without physical trial costs, and demonstrates its effectiveness in improving commodity search.
Contribution
The paper presents a novel approach to simulate a complex online retail environment using GAN-SD and MAIL, facilitating reinforcement learning without real-world sampling costs.
Findings
Virtual Taobao faithfully replicates key properties of the real environment.
Policies trained in Virtual Taobao outperform traditional supervised methods online.
The approach reduces physical trial costs in reinforcement learning applications.
Abstract
Applying reinforcement learning in physical-world tasks is extremely challenging. It is commonly infeasible to sample a large number of trials, as required by current reinforcement learning methods, in a physical environment. This paper reports our project on using reinforcement learning for better commodity search in Taobao, one of the largest online retail platforms and meanwhile a physical environment with a high sampling cost. Instead of training reinforcement learning in Taobao directly, we present our approach: first we build Virtual Taobao, a simulator learned from historical customer behavior data through the proposed GAN-SD (GAN for Simulating Distributions) and MAIL (multi-agent adversarial imitation learning), and then we train policies in Virtual Taobao with no physical costs in which ANC (Action Norm Constraint) strategy is proposed to reduce over-fitting. In experiments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Robot Manipulation and Learning
