Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

Dake Zhang; Boxiang Lyu; Shuang Qiu; Mladen Kolar; Tong Zhang

arXiv:2407.07631·cs.LG·July 11, 2024

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang

PDF

Open Access

TL;DR

This paper develops the first provably efficient offline reinforcement learning algorithms that incorporate risk sensitivity using entropic risk measures, specifically tailored for linear MDPs, improving decision-making under uncertainty.

Contribution

It introduces two novel risk-sensitive offline RL algorithms with provable guarantees, addressing a significant gap in the theoretical understanding of risk-aware decision-making from fixed datasets.

Findings

01

First risk-sensitive offline RL algorithms with theoretical guarantees

02

Improved bounds using variance information and reference-advantage decomposition

03

Enhanced dependence on dimension and risk factor in performance bounds

Abstract

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work focuses on applying the entropic risk measure to RL problems. While existing literature primarily investigates the online setting, there remains a large gap in understanding how to efficiently derive a near-optimal policy based on this risk measure using only a pre-collected dataset. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint. In response, we introduce two provably sample-efficient algorithms. We begin by presenting a risk-sensitive pessimistic value iteration algorithm, offering a tight analysis by leveraging the structure of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimism, Hope, and Well-being