Actor-Critic Pretraining for Proximal Policy Optimization

Andreas Kernbach; Amr Elsheikh; Nicolas Grupp; Ren\'e Nagel; and Marco F. Huber

arXiv:2602.23804·cs.LG·March 2, 2026

Actor-Critic Pretraining for Proximal Policy Optimization

Andreas Kernbach, Amr Elsheikh, Nicolas Grupp, Ren\'e Nagel, and Marco F. Huber

PDF

Open Access

TL;DR

This paper introduces a pretraining method for actor-critic reinforcement learning algorithms that uses expert demonstrations to initialize both actor and critic networks, significantly improving sample efficiency in robotic tasks.

Contribution

It presents a novel pretraining approach for both actor and critic networks in actor-critic algorithms like PPO, leveraging expert data to enhance learning efficiency.

Findings

01

Pretraining improves sample efficiency by 86.1% on average.

02

Pretraining outperforms actor-only pretraining by 30.9%.

03

Method tested on 15 robotic tasks with positive results.

Abstract

Reinforcement learning (RL) actor-critic algorithms enable autonomous learning but often require a large number of environment interactions, which limits their applicability in robotics. Leveraging expert data can reduce the number of required environment interactions. A common approach is actor pretraining, where the actor network is initialized via behavioral cloning on expert demonstrations and subsequently fine-tuned with RL. In contrast, the initialization of the critic network has received little attention, despite its central role in policy optimization. This paper proposes a pretraining approach for actor-critic algorithms like Proximal Policy Optimization (PPO) that uses expert demonstrations to initialize both networks. The actor is pretrained via behavioral cloning, while the critic is pretrained using returns obtained from rollouts of the pretrained policy. The approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Robotic Locomotion and Control