Data Efficient Training for Reinforcement Learning with Adaptive   Behavior Policy Sharing

Ge Liu; Rui Wu; Heng-Tze Cheng; Jing Wang; Jayden Ooi; Lihong Li; Ang; Li; Wai Lok Sibon Li; Craig Boutilier; Ed Chi

arXiv:2002.05229·cs.LG·February 14, 2020·1 cites

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang, Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi

PDF

Open Access

TL;DR

This paper introduces ABPS, a data-efficient reinforcement learning training method that shares experience among agents with adaptively selected policies, reducing hyper-parameter tuning costs and improving performance.

Contribution

The paper proposes ABPS, a novel adaptive experience sharing algorithm, and extends it with ABPS-PBT for hyper-parameter evolution, enhancing data efficiency and convergence speed in RL training.

Findings

01

ABPS outperforms traditional hyper-parameter tuning in Atari games.

02

ABPS reduces variance among top agents.

03

ABPS-PBT accelerates convergence and further reduces variance.

Abstract

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with an adapted version of Population Based Training (ABPS-PBT). We conduct experiments with multiple Atari games with up to 16…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Population Based Training