Stochastically Dominant Distributional Reinforcement Learning

John D. Martin; Michal Lyskawinski; Xiaohu Li; Brendan Englot

arXiv:1905.07318·cs.LG·October 8, 2020·1 cites

Stochastically Dominant Distributional Reinforcement Learning

John D. Martin, Michal Lyskawinski, Xiaohu Li, Brendan Englot

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel distributional reinforcement learning approach using second-order stochastic dominance to better manage uncertainty, with a particle-based algorithm demonstrating improved risk-performance trade-offs.

Contribution

It proposes a new SSD-based distributional RL method, mapping the problem to Wasserstein gradient flows and providing a convergent particle algorithm.

Findings

01

SSD policy balances uncertainty and performance better than other risk measures.

02

The particle algorithm is proven to be optimal and convergent.

03

Experiments validate the effectiveness of the proposed approach.

Abstract

We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a more comprehensive and robust evaluation of the environment's uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm performance and demonstrate how uncertainty and performance are better balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stochastically Dominant Distributional Reinforcement Learning· slideslive

Taxonomy

TopicsRisk and Portfolio Optimization · Probabilistic and Robust Engineering Design · Advanced Multi-Objective Optimization Algorithms

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD