Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
Dake Zhang, Boxiang Lyu, Shuang Qiu, Mladen Kolar, Tong Zhang

TL;DR
This paper develops the first provably efficient offline reinforcement learning algorithms that incorporate risk sensitivity using entropic risk measures, specifically tailored for linear MDPs, improving decision-making under uncertainty.
Contribution
It introduces two novel risk-sensitive offline RL algorithms with provable guarantees, addressing a significant gap in the theoretical understanding of risk-aware decision-making from fixed datasets.
Findings
First risk-sensitive offline RL algorithms with theoretical guarantees
Improved bounds using variance information and reference-advantage decomposition
Enhanced dependence on dimension and risk factor in performance bounds
Abstract
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work focuses on applying the entropic risk measure to RL problems. While existing literature primarily investigates the online setting, there remains a large gap in understanding how to efficiently derive a near-optimal policy based on this risk measure using only a pre-collected dataset. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint. In response, we introduce two provably sample-efficient algorithms. We begin by presenting a risk-sensitive pessimistic value iteration algorithm, offering a tight analysis by leveraging the structure of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimism, Hope, and Well-being
