Online Prototype Alignment for Few-shot Policy Transfer

Qi Yi; Rui Zhang; Shaohui Peng; Jiaming Guo; Yunkai Gao; Kaizhao Yuan,; Ruizhi Chen; Siming Lan; Xing Hu; Zidong Du; Xishan Zhang; Qi Guo; and Yunji; Chen

arXiv:2306.07307·cs.LG·June 14, 2023·1 cites

Online Prototype Alignment for Few-shot Policy Transfer

Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan,, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, and Yunji, Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Online Prototype Alignment (OPA), a novel RL domain adaptation method that enables few-shot policy transfer by aligning elements based on functionality rather than visual similarity, with efficient exploration.

Contribution

The paper proposes a new framework for RL domain adaptation that learns functional element mappings with minimal target domain data, outperforming prior visual-based methods.

Findings

01

OPA achieves better transfer with fewer target samples.

02

It outperforms prior methods in visually different domains.

03

Effective in few-shot RL policy transfer scenarios.

Abstract

Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

albertcity/opa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

Methodsfail