Efficient Transformers in Reinforcement Learning using Actor-Learner   Distillation

Emilio Parisotto; Ruslan Salakhutdinov

arXiv:2104.01655·cs.LG·April 6, 2021·5 cites

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Emilio Parisotto, Ruslan Salakhutdinov

PDF

Open Access 1 Video

TL;DR

This paper introduces Actor-Learner Distillation, a method to transfer knowledge from large, complex transformer models to smaller, efficient LSTM actors in reinforcement learning, enabling high performance under system constraints.

Contribution

The paper proposes a novel distillation procedure that allows large transformer learners to improve small LSTM actors, addressing system constraints in RL applications.

Findings

01

Distillation recovers transformer sample-efficiency in LSTM actors.

02

LSTM actors maintain fast inference and reduced training time.

03

Method enables scalable RL under actor-latency constraints.

Abstract

Many real-world applications such as robotics provide hard constraints on power and compute that limit the viable model complexity of Reinforcement Learning (RL) agents. Similarly, in many distributed RL settings, acting is done on un-accelerated hardware such as CPUs, which likewise restricts model size to prevent intractable experiment run times. These "actor-latency" constrained settings present a major obstruction to the scaling up of model complexity that has recently been extremely successful in supervised learning. To be able to utilize large model capacity while still operating within the limits imposed by the system during acting, we develop an "Actor-Learner Distillation" (ALD) procedure that leverages a continual form of distillation that transfers learning progress from a large capacity learner model to a small capacity actor model. As a case study, we develop this procedure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory