Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Emilio Parisotto, Ruslan Salakhutdinov

TL;DR
This paper introduces Actor-Learner Distillation, a method to transfer knowledge from large, complex transformer models to smaller, efficient LSTM actors in reinforcement learning, enabling high performance under system constraints.
Contribution
The paper proposes a novel distillation procedure that allows large transformer learners to improve small LSTM actors, addressing system constraints in RL applications.
Findings
Distillation recovers transformer sample-efficiency in LSTM actors.
LSTM actors maintain fast inference and reduced training time.
Method enables scalable RL under actor-latency constraints.
Abstract
Many real-world applications such as robotics provide hard constraints on power and compute that limit the viable model complexity of Reinforcement Learning (RL) agents. Similarly, in many distributed RL settings, acting is done on un-accelerated hardware such as CPUs, which likewise restricts model size to prevent intractable experiment run times. These "actor-latency" constrained settings present a major obstruction to the scaling up of model complexity that has recently been extremely successful in supervised learning. To be able to utilize large model capacity while still operating within the limits imposed by the system during acting, we develop an "Actor-Learner Distillation" (ALD) procedure that leverages a continual form of distillation that transfers learning progress from a large capacity learner model to a small capacity actor model. As a case study, we develop this procedure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
