Similarity-based Knowledge Transfer for Cross-Domain Reinforcement Learning
Sergio A. Serrano, Jose Martinez-Carranza, L. Enrique Sucar

TL;DR
This paper introduces a similarity-based method for cross-domain reinforcement learning that effectively transfers knowledge without requiring aligned data or expert policies, demonstrated on Mujoco tasks.
Contribution
It proposes a semi-supervised alignment loss for measuring task similarity and selecting source tasks for transfer in reinforcement learning.
Findings
Method effectively transfers knowledge across diverse Mujoco tasks.
The approach does not require aligned or expert-collected data.
Robust performance in selecting and transferring knowledge without tailored source tasks.
Abstract
Transferring knowledge in cross-domain reinforcement learning is a challenging setting in which learning is accelerated by reusing knowledge from a task with different observation and/or action space. However, it is often necessary to carefully select the source of knowledge for the receiving end to benefit from the transfer process. In this article, we study how to measure the similarity between cross-domain reinforcement learning tasks to select a source of knowledge that will improve the performance of the learning agent. We developed a semi-supervised alignment loss to match different spaces with a set of encoder-decoders, and use them to measure similarity and transfer policies across tasks. In comparison to prior works, our method does not require data to be aligned, paired or collected by expert policies. Experimental results, on a set of varied Mujoco control tasks, show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robotic Locomotion and Control
MethodsSparse Evolutionary Training
