Online Prototype Alignment for Few-shot Policy Transfer
Qi Yi, Rui Zhang, Shaohui Peng, Jiaming Guo, Yunkai Gao, Kaizhao Yuan,, Ruizhi Chen, Siming Lan, Xing Hu, Zidong Du, Xishan Zhang, Qi Guo, and Yunji, Chen

TL;DR
This paper introduces Online Prototype Alignment (OPA), a novel RL domain adaptation method that enables few-shot policy transfer by aligning elements based on functionality rather than visual similarity, with efficient exploration.
Contribution
The paper proposes a new framework for RL domain adaptation that learns functional element mappings with minimal target domain data, outperforming prior visual-based methods.
Findings
OPA achieves better transfer with fewer target samples.
It outperforms prior methods in visually different domains.
Effective in few-shot RL policy transfer scenarios.
Abstract
Domain adaptation in reinforcement learning (RL) mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve the few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
Methodsfail
