Actor-Critic Pretraining for Proximal Policy Optimization
Andreas Kernbach, Amr Elsheikh, Nicolas Grupp, Ren\'e Nagel, and Marco F. Huber

TL;DR
This paper introduces a pretraining method for actor-critic reinforcement learning algorithms that uses expert demonstrations to initialize both actor and critic networks, significantly improving sample efficiency in robotic tasks.
Contribution
It presents a novel pretraining approach for both actor and critic networks in actor-critic algorithms like PPO, leveraging expert data to enhance learning efficiency.
Findings
Pretraining improves sample efficiency by 86.1% on average.
Pretraining outperforms actor-only pretraining by 30.9%.
Method tested on 15 robotic tasks with positive results.
Abstract
Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control
