Distributionally-Constrained Policy Optimization via Unbalanced Optimal   Transport

Arash Givchi; Pei Wang; Junqi Wang; Patrick Shafto

arXiv:2102.07889·cs.LG·February 17, 2021

Distributionally-Constrained Policy Optimization via Unbalanced Optimal Transport

Arash Givchi, Pei Wang, Junqi Wang, Patrick Shafto

PDF

Open Access

TL;DR

This paper introduces a novel constrained policy optimization method in reinforcement learning using unbalanced optimal transport, enabling effective handling of distribution constraints with an actor-critic algorithm.

Contribution

It formulates constrained policy optimization as unbalanced optimal transport and develops a general RL objective optimized via Dykstra's algorithm, including an actor-critic implementation.

Findings

01

Effective handling of distribution constraints in RL

02

Demonstrated success on various applications

03

Robust performance with sampling-based implementation

Abstract

We consider constrained policy optimization in Reinforcement Learning, where the constraints are in form of marginals on state visitations and global action executions. Given these distributions, we formulate policy optimization as unbalanced optimal transport over the space of occupancy measures. We propose a general purpose RL objective based on Bregman divergence and optimize it using Dykstra's algorithm. The approach admits an actor-critic algorithm for when the state or action space is large, and only samples from the marginals are available. We discuss applications of our approach and provide demonstrations to show the effectiveness of our algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management