Relative Policy-Transition Optimization for Fast Policy Transfer
Jiawei Xu, Cheng Zhou, Yizheng Zhang, Baoxiang Wang, Lei Han

TL;DR
This paper introduces a unified framework called RPTO for fast policy transfer between different MDPs by optimizing both policy and environment dynamics, demonstrated on MuJoCo tasks.
Contribution
The paper proposes the novel RPTO algorithm that combines policy and transition optimization for efficient transfer learning across MDPs.
Findings
RPTO achieves faster policy transfer in MuJoCo tasks.
The algorithms effectively reduce the relativity gap between different environments.
RPTO outperforms baseline methods in transfer efficiency.
Abstract
We consider the problem of policy transfer between two Markov Decision Processes (MDPs). We introduce a lemma based on existing theoretical results in reinforcement learning to measure the relativity gap between two arbitrary MDPs, that is the difference between any two cumulative expected returns defined on different policies and environment dynamics. Based on this lemma, we propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO), which offer fast policy transfer and dynamics modelling, respectively. RPO transfers the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model to reduce the gap between the dynamics of the two environments. Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management
