Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations
Xiaoqin Zhang, Huimin Ma

TL;DR
This paper introduces a novel method for pretraining actor-critic reinforcement learning algorithms using expert demonstrations, improving training efficiency without relying on the global optimality assumption.
Contribution
It presents a theoretically grounded approach to pretrain both policy and value functions in actor-critic algorithms using expert demonstrations, addressing instability issues.
Findings
Outperforms non-pretrained algorithms in experiments
Enhances simulation efficiency of RL training
Applicable to DDPG and ACER algorithms
Abstract
Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-critic reinforcement learning algorithms. Also, some existing methods rely on the global optimum assumption, which is not true in most scenarios. In this paper, we employ expert demonstrations in a actor-critic reinforcement learning framework, and meanwhile ensure that the performance is not affected by the fact that expert demonstrations are not global optimal. We theoretically derive a method for computing policy gradients and value estimators with only expert demonstrations. Our method is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning
MethodsEntropy Regularization · Softmax · Trust Region Policy Optimization · Retrace · Stochastic Dueling Network · ACER · Experience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia?
