OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Yu Luo; Tianying Ji; Fuchun Sun; Jianwei Zhang; Huazhe Xu; Xianyuan; Zhan

arXiv:2405.19080·cs.LG·May 30, 2024

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan, Zhan

PDF

Open Access 1 Repo

TL;DR

OMPO introduces a unified approach for reinforcement learning under policy and dynamics shifts by matching transition occupancy, leading to improved performance across diverse environments and settings.

Contribution

The paper proposes a novel occupancy-matching framework with a tractable min-max formulation and an actor-critic architecture, advancing RL under policy and dynamics shifts.

Findings

01

OMPO outperforms existing baselines in various environments.

02

OMPO performs well with domain randomization in robotics.

03

The method is effective under both stationary and nonstationary dynamics.

Abstract

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

roythuly/ompo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Software Reliability and Analysis Research