Learn Dynamic-Aware State Embedding for Transfer Learning

Kaige Yang

arXiv:2101.02230·cs.LG·January 8, 2021

Learn Dynamic-Aware State Embedding for Transfer Learning

Kaige Yang

PDF

Open Access

TL;DR

This paper proposes a novel transfer learning method in reinforcement learning that infers binary MDP dynamics from trajectories to learn task-agnostic state embeddings, improving sample efficiency across tasks.

Contribution

It introduces an online method to infer binary MDP dynamics from any policy, guiding state embedding learning for effective transfer in reinforcement learning.

Findings

01

Enhanced transfer learning performance in various tasks.

02

State embeddings are task and policy agnostic.

03

Improved exploration with a new intrinsic reward.

Abstract

Transfer reinforcement learning aims to improve the sample efficiency of solving unseen new tasks by leveraging experiences obtained from previous tasks. We consider the setting where all tasks (MDPs) share the same environment dynamic except reward function. In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy. However, trajectories generated by uniform random policy are not useful for policy improvement, which impairs the sample efficiency severely. Instead, we observe that the binary MDP dynamic can be inferred from trajectories of any policy which avoids the need of uniform random policy. As the binary MDP dynamic contains the state structure shared over all tasks we believe it is suitable to transfer. Built on this observation, we introduce a method to infer the binary MDP dynamic on-line and at the same time utilize it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · EEG and Brain-Computer Interfaces