Distill Knowledge in Multi-task Reinforcement Learning with   Optimal-Transport Regularization

Bang Giang Le; Viet Cuong Ta

arXiv:2309.15603·cs.LG·September 28, 2023

Distill Knowledge in Multi-task Reinforcement Learning with Optimal-Transport Regularization

Bang Giang Le, Viet Cuong Ta

PDF

TL;DR

This paper introduces an Optimal Transport-based regularization method for multi-task reinforcement learning, replacing traditional KL divergence, to improve knowledge transfer and training efficiency across related tasks.

Contribution

It proposes a novel regularization approach using Optimal Transport with Sinkhorn mapping to enhance multi-task RL training.

Findings

01

Speeds up learning process of agents.

02

Outperforms baseline methods in multi-task learning.

03

Effective on grid-based navigation tasks.

Abstract

In multi-task reinforcement learning, it is possible to improve the data efficiency of training agents by transferring knowledge from other different but related tasks. Because the experiences from different tasks are usually biased toward the specific task goals. Traditional methods rely on Kullback-Leibler regularization to stabilize the transfer of knowledge from one task to the others. In this work, we explore the direction of replacing the Kullback-Leibler divergence with a novel Optimal transport-based regularization. By using the Sinkhorn mapping, we can approximate the Optimal transport distance between the state distribution of tasks. The distance is then used as an amortized reward to regularize the amount of sharing information. We experiment our frameworks on several grid-based navigation multi-goal to validate the effectiveness of the approach. The results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings