How to Enable Uncertainty Estimation in Proximal Policy Optimization

Eugene Bykovets; Yannick Metz; Mennatallah El-Assady; Daniel A. Keim,; Joachim M. Buhmann

arXiv:2210.03649·cs.LG·October 10, 2022

How to Enable Uncertainty Estimation in Proximal Policy Optimization

Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim,, Joachim M. Buhmann

PDF

Open Access

TL;DR

This paper introduces definitions and methods for uncertainty estimation in on-policy RL, specifically PPO, and compares their effectiveness in detecting out-of-distribution states across various environments.

Contribution

It provides the first formal definitions of uncertainty and OOD for PPO, implements multiple estimation methods, and proposes a Pareto optimization approach for balancing reward and OOD detection.

Findings

01

Masksembles offer a good balance between uncertainty estimation and reward performance.

02

Uncertainty estimation methods vary in OOD detection quality.

03

Pareto optimization improves OOD detection without sacrificing reward.

Abstract

While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Smart Grid Energy Management

MethodsDeep Ensembles · Dropout