Trust Region Policy Optimization with Optimal Transport Discrepancies:   Duality and Algorithm for Continuous Actions

Antonio Terpin; Nicolas Lanzetti; Batuhan Yardim; Florian D\"orfler,; Giorgia Ramponi

arXiv:2210.11137·cs.LG·October 21, 2022·1 cites

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Antonio Terpin, Nicolas Lanzetti, Batuhan Yardim, Florian D\"orfler,, Giorgia Ramponi

PDF

Open Access 1 Video

TL;DR

This paper introduces OT-TRPO, a novel policy optimization algorithm using optimal transport discrepancies, specifically Wasserstein distance, for continuous control tasks, providing a dual formulation and demonstrating improved performance.

Contribution

The paper develops a new trust region policy optimization method based on optimal transport discrepancies with a dual reformulation for continuous spaces, enabling practical algorithms.

Findings

01

OT-TRPO outperforms existing methods in control tasks.

02

The dual formulation simplifies computation of policy updates.

03

Optimal transport discrepancies offer advantages over traditional divergence measures.

Abstract

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Metal-Organic Frameworks: Synthesis and Applications · Adversarial Robustness in Machine Learning