Proximal Policy Distillation

Giacomo Spigler

arXiv:2407.15134·cs.LG·May 11, 2026

Proximal Policy Distillation

Giacomo Spigler

PDF

1 Repo

TL;DR

Proximal Policy Distillation (PPD) is a new method combining student-driven distillation with PPO, improving sample efficiency and robustness in reinforcement learning across various environments.

Contribution

The paper introduces PPD, a novel policy distillation technique that leverages PPO to enhance efficiency and robustness in policy transfer tasks.

Findings

01

PPD outperforms traditional distillation methods in sample efficiency.

02

PPD produces higher quality student policies across diverse environments.

03

PPD is more robust when distilling from imperfect demonstrations.

Abstract

We introduce Proximal Policy Distillation (PPD), a novel policy distillation method that integrates student-driven distillation and Proximal Policy Optimization (PPO) to increase sample efficiency and to leverage the additional rewards that the student policy collects during distillation. To assess the efficacy of our method, we compare PPD with two common alternatives, student-distill and teacher-distill, over a wide range of reinforcement learning environments that include discrete actions and continuous control (ATARI, Mujoco, and Procgen). For each environment and method, we perform distillation to a set of target student neural networks that are smaller, identical (self-distillation), or larger than the teacher network. Our findings indicate that PPD improves sample efficiency and produces better student policies compared to typical policy distillation approaches. Moreover, PPD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.