Reinforcement Learning with Wasserstein Distance Regularisation, with   Applications to Multipolicy Learning

Mohammed Amin Abdullah; Aldo Pacchiano; Moez Draief

arXiv:1802.03976·cs.LG·August 1, 2019·1 cites

Reinforcement Learning with Wasserstein Distance Regularisation, with Applications to Multipolicy Learning

Mohammed Amin Abdullah, Aldo Pacchiano, Moez Draief

PDF

Open Access

TL;DR

This paper introduces a Wasserstein distance-based regularization method for reinforcement learning, enabling the learning of multiple diverse policies and controlling policy trajectory distributions.

Contribution

It applies Wasserstein distance to quantify policy differences and proposes a regularization technique to learn multiple or targeted policies in reinforcement learning.

Findings

01

Effective in learning diverse policies with controlled differences.

02

Can attract policy distributions to a fixed target.

03

Provides a new regularization framework for policy optimization.

Abstract

We describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning