Soft-Robust Algorithms for Batch Reinforcement Learning

Elita A. Lobo; Mohammad Ghavamzadeh; Marek Petrik

arXiv:2011.14495·cs.LG·March 1, 2021

Soft-Robust Algorithms for Batch Reinforcement Learning

Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

PDF

Open Access

TL;DR

This paper introduces the soft-robust criterion for batch reinforcement learning, balancing mean performance and risk, and proposes algorithms that outperform existing conservative methods in lessening over-caution.

Contribution

It establishes the properties of the soft-robust criterion, proves its NP-hardness, and provides two algorithms with theoretical and empirical validation.

Findings

01

Algorithms produce less conservative policies than percentile-based methods.

02

The soft-robust criterion effectively balances mean and risk.

03

The proposed methods outperform existing approaches in empirical tests.

Abstract

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure. Unfortunately, such policies are typically overly conservative as the percentile criterion is non-convex, difficult to optimize, and ignores the mean performance. To overcome these shortcomings, we study the soft-robust criterion, which uses risk measures to balance the mean and percentile criterion better. In this paper, we establish the soft-robust criterion's fundamental properties, show that it is NP-hard to optimize, and propose and analyze two algorithms to approximately optimize it. Our theoretical analyses and empirical evaluations demonstrate that our algorithms compute much less conservative solutions than the existing approximate methods for optimizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Adaptive Dynamic Programming Control