How to Enable Uncertainty Estimation in Proximal Policy Optimization
Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, Daniel A. Keim,, Joachim M. Buhmann

TL;DR
This paper introduces definitions and methods for uncertainty estimation in on-policy RL, specifically PPO, and compares their effectiveness in detecting out-of-distribution states across various environments.
Contribution
It provides the first formal definitions of uncertainty and OOD for PPO, implements multiple estimation methods, and proposes a Pareto optimization approach for balancing reward and OOD detection.
Findings
Masksembles offer a good balance between uncertainty estimation and reward performance.
Uncertainty estimation methods vary in OOD detection quality.
Pareto optimization improves OOD detection without sacrificing reward.
Abstract
While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Smart Grid Energy Management
MethodsDeep Ensembles · Dropout
