Transfer RL via the Undo Maps Formalism
Abhi Gupta, Ted Moskovitz, David Alvarez-Melis, Aldo Pacchiano

TL;DR
This paper introduces TvD, a novel transfer reinforcement learning framework that uses distribution matching and optimal transport to adapt policies across environments with different state spaces, without modifying the original policies.
Contribution
The paper proposes a data-centric transfer method using optimal transport to learn environment transformations, enabling policy transfer without domain-specific assumptions.
Findings
Successful transfer across environment transformations in gridworlds
Effective distribution matching via optimal transport
Policy transfer without modifying original policies
Abstract
Transferring knowledge across domains is one of the most fundamental problems in machine learning, but doing so effectively in the context of reinforcement learning remains largely an open problem. Current methods make strong assumptions on the specifics of the task, often lack principled objectives, and -- crucially -- modify individual policies, which might be sub-optimal when the domains differ due to a drift in the state space, i.e., it is intrinsic to the environment and therefore affects every agent interacting with it. To address these drawbacks, we propose TvD: transfer via distribution matching, a framework to transfer knowledge across interactive domains. We approach the problem from a data-centric perspective, characterizing the discrepancy in environments by means of (potentially complex) transformation between their state spaces, and thus posing the problem of transfer as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Topic Modeling
