Fingerprint Policy Optimisation for Robust Reinforcement Learning
Supratik Paul, Michael A. Osborne, Shimon Whiteson

TL;DR
This paper introduces Fingerprint Policy Optimisation (FPO), a method that uses Bayesian optimisation to adapt environment variables, improving the robustness and efficiency of reinforcement learning policies against rare but impactful events.
Contribution
FPO is a novel approach that actively optimizes environment variable distributions using low-dimensional policy fingerprints, enhancing policy robustness in reinforcement learning.
Findings
FPO efficiently learns robust policies against rare events.
FPO outperforms standard policy gradient methods in robustness.
FPO effectively adapts environment variables during training.
Abstract
Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the environment variable has a large impact on the transition dynamics. In this paper, we present fingerprint policy optimisation (FPO), which finds a policy that is optimal in expectation across the distribution of environment variables. The central idea is to use Bayesian optimisation (BO) to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this BO practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. Our experiments show that FPO can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Machine Learning and ELM
