Fingerprint Policy Optimisation for Robust Reinforcement Learning

Supratik Paul; Michael A. Osborne; Shimon Whiteson

arXiv:1805.10662·cs.LG·May 28, 2019·1 cites

Fingerprint Policy Optimisation for Robust Reinforcement Learning

Supratik Paul, Michael A. Osborne, Shimon Whiteson

PDF

Open Access

TL;DR

This paper introduces Fingerprint Policy Optimisation (FPO), a method that uses Bayesian optimisation to adapt environment variables, improving the robustness and efficiency of reinforcement learning policies against rare but impactful events.

Contribution

FPO is a novel approach that actively optimizes environment variable distributions using low-dimensional policy fingerprints, enhancing policy robustness in reinforcement learning.

Findings

01

FPO efficiently learns robust policies against rare events.

02

FPO outperforms standard policy gradient methods in robustness.

03

FPO effectively adapts environment variables during training.

Abstract

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the distribution of environment variables. The central idea is to use Bayesian optimisation (BO) to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this BO practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. Our experiments show that FPO can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Machine Learning and ELM