Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion
Josh Roy, George Konidaris

TL;DR
This paper presents WAPPO, a new reinforcement learning algorithm that uses Wasserstein distance to align feature distributions for effective visual transfer across different tasks, outperforming previous methods.
Contribution
WAPPO introduces a Wasserstein-based adversarial approach for visual transfer in reinforcement learning, explicitly aligning feature distributions between source and target tasks.
Findings
WAPPO outperforms previous state-of-the-art in visual transfer tasks.
Successfully transfers policies across Visual Cartpole and Procgen environments.
Effective alignment of feature distributions improves transfer performance.
Abstract
We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and two instantiations of 16 OpenAI Procgen environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques
