Online Policy Distillation with Decision-Attention
Xinqiang Yu, Chuanguang Yang, Chengqing Yu, Libo Huang, Zhulin An,, Yongjun Xu

TL;DR
This paper introduces an online policy distillation framework with decision-attention that enables multiple reinforcement learning policies to learn from each other in real-time, improving performance without relying on a pre-trained teacher.
Contribution
The work proposes a novel online policy distillation method with decision-attention, allowing diverse policies to transfer knowledge dynamically in the same environment.
Findings
OPD-DA outperforms independent training on Atari tasks.
Decision-Attention effectively measures policy importance.
Knowledge transfer improves reward acquisition.
Abstract
Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Access Control and Trust · Bayesian Modeling and Causal Inference
MethodsSparse Evolutionary Training · Dense Connections · Convolution · Q-Learning · Deep Q-Network · Entropy Regularization · Proximal Policy Optimization
