Relative Policy-Transition Optimization for Fast Policy Transfer

Jiawei Xu; Cheng Zhou; Yizheng Zhang; Baoxiang Wang; Lei Han

arXiv:2206.06009·cs.LG·January 25, 2024

Relative Policy-Transition Optimization for Fast Policy Transfer

Jiawei Xu, Cheng Zhou, Yizheng Zhang, Baoxiang Wang, Lei Han

PDF

Open Access

TL;DR

This paper introduces a unified framework called RPTO for fast policy transfer between different MDPs by optimizing both policy and environment dynamics, demonstrated on MuJoCo tasks.

Contribution

The paper proposes the novel RPTO algorithm that combines policy and transition optimization for efficient transfer learning across MDPs.

Findings

01

RPTO achieves faster policy transfer in MuJoCo tasks.

02

The algorithms effectively reduce the relativity gap between different environments.

03

RPTO outperforms baseline methods in transfer efficiency.

Abstract

We consider the problem of policy transfer between two Markov Decision Processes (MDPs). We introduce a lemma based on existing theoretical results in reinforcement learning to measure the relativity gap between two arbitrary MDPs, that is the difference between any two cumulative expected returns defined on different policies and environment dynamics. Based on this lemma, we propose two new algorithms referred to as Relative Policy Optimization (RPO) and Relative Transition Optimization (RTO), which offer fast policy transfer and dynamics modelling, respectively. RPO transfers the policy evaluated in one environment to maximize the return in another, while RTO updates the parameterized dynamics model to reduce the gap between the dynamics of the two environments. Integrating the two algorithms results in the complete Relative Policy-Transition Optimization (RPTO) algorithm, in which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management