Policy Distillation

Andrei A. Rusu; Sergio Gomez Colmenarejo; Caglar Gulcehre; Guillaume; Desjardins; James Kirkpatrick; Razvan Pascanu; Volodymyr Mnih; Koray; Kavukcuoglu; Raia Hadsell

arXiv:1511.06295·cs.LG·January 8, 2016·ICLR·87 cites

Policy Distillation

Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume, Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray, Kavukcuoglu, Raia Hadsell

PDF

Open Access 1 Repo

TL;DR

This paper introduces policy distillation, a method to compress and consolidate reinforcement learning policies into smaller, efficient networks that outperform original and multi-task agents in Atari games.

Contribution

The paper presents a novel policy distillation technique that creates compact, high-performing reinforcement learning policies and consolidates multiple task-specific policies into a single model.

Findings

01

Distilled policies match or outperform expert policies.

02

Multi-task distilled agent surpasses single-task and jointly-trained DQN agents.

03

Method reduces network size and training complexity.

Abstract

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dsapandora/s_cera
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · CCD and CMOS Imaging Sensors · Visual Attention and Saliency Detection

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network