Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With   Expert Demonstrations

Xiaoqin Zhang; Huimin Ma

arXiv:1801.10459·cs.AI·February 12, 2018·32 cites

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Xiaoqin Zhang, Huimin Ma

PDF

Open Access

TL;DR

This paper introduces a novel method for pretraining actor-critic reinforcement learning algorithms using expert demonstrations, improving training efficiency without relying on the global optimality assumption.

Contribution

It presents a theoretically grounded approach to pretrain both policy and value functions in actor-critic algorithms using expert demonstrations, addressing instability issues.

Findings

01

Outperforms non-pretrained algorithms in experiments

02

Enhances simulation efficiency of RL training

03

Applicable to DDPG and ACER algorithms

Abstract

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-critic reinforcement learning algorithms. Also, some existing methods rely on the global optimum assumption, which is not true in most scenarios. In this paper, we employ expert demonstrations in a actor-critic reinforcement learning framework, and meanwhile ensure that the performance is not affected by the fact that expert demonstrations are not global optimal. We theoretically derive a method for computing policy gradients and value estimators with only expert demonstrations. Our method is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning

MethodsEntropy Regularization · Softmax · Trust Region Policy Optimization · Retrace · Stochastic Dueling Network · ACER · Experience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia?