Estimating Risk and Uncertainty in Deep Reinforcement Learning

William R. Clements; Bastien Van Delft; Beno\^it-Marie Robaglia; Reda; Bahi Slaoui; S\'ebastien Toth

arXiv:1905.09638·cs.LG·September 10, 2020·55 cites

Estimating Risk and Uncertainty in Deep Reinforcement Learning

William R. Clements, Bastien Van Delft, Beno\^it-Marie Robaglia, Reda, Bahi Slaoui, S\'ebastien Toth

PDF

Open Access 2 Repos

TL;DR

This paper presents a framework for disentangling and estimating epistemic and aleatoric uncertainties in deep reinforcement learning, enabling safer exploration and risk-sensitive decision-making.

Contribution

The authors introduce an unbiased estimator for both uncertainties and an uncertainty-aware DQN algorithm that improves safety and performance.

Findings

01

The proposed method accurately estimates uncertainties in RL agents.

02

Uncertainty-aware DQN outperforms standard variants on MinAtar.

03

The framework enhances safe exploration in stochastic environments.

Abstract

Reinforcement learning agents are faced with two types of uncertainty. Epistemic uncertainty stems from limited data and is useful for exploration, whereas aleatoric uncertainty arises from stochastic environments and must be accounted for in risk-sensitive applications. We highlight the challenges involved in simultaneously estimating both of them, and propose a framework for disentangling and estimating these uncertainties on learned Q-values. We derive unbiased estimators of these uncertainties and introduce an uncertainty-aware DQN algorithm, which we show exhibits safe learning behavior and outperforms other DQN variants on the MinAtar testbed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning