Online Policy Distillation with Decision-Attention

Xinqiang Yu; Chuanguang Yang; Chengqing Yu; Libo Huang; Zhulin An,; Yongjun Xu

arXiv:2406.05488·cs.LG·June 11, 2024

Online Policy Distillation with Decision-Attention

Xinqiang Yu, Chuanguang Yang, Chengqing Yu, Libo Huang, Zhulin An,, Yongjun Xu

PDF

Open Access

TL;DR

This paper introduces an online policy distillation framework with decision-attention that enables multiple reinforcement learning policies to learn from each other in real-time, improving performance without relying on a pre-trained teacher.

Contribution

The work proposes a novel online policy distillation method with decision-attention, allowing diverse policies to transfer knowledge dynamically in the same environment.

Findings

01

OPD-DA outperforms independent training on Atari tasks.

02

Decision-Attention effectively measures policy importance.

03

Knowledge transfer improves reward acquisition.

Abstract

Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Access Control and Trust · Bayesian Modeling and Causal Inference

MethodsSparse Evolutionary Training · Dense Connections · Convolution · Q-Learning · Deep Q-Network · Entropy Regularization · Proximal Policy Optimization