Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk

Sumedh Gupte; Shrey Rakeshkumar Patel; Soumen Pachal; Prashanth L. A.; Sanjay P. Bhat

arXiv:2602.09300·cs.LG·February 11, 2026

Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk

Sumedh Gupte, Shrey Rakeshkumar Patel, Soumen Pachal, Prashanth L. A., Sanjay P. Bhat

PDF

Open Access

TL;DR

This paper develops risk-sensitive reinforcement learning algorithms for expectiles, shortfall risk, and optimized certainty equivalent risk, deriving policy gradient theorems, estimators, and convergence bounds, validated through experiments.

Contribution

It introduces novel risk-sensitive policy gradient algorithms for three risk measures, with theoretical convergence guarantees and empirical validation.

Findings

01

Established $ ext{O}(1/m)$ mean-squared error bounds for estimators

02

Proved smoothness and convergence rates of risk-sensitive objectives

03

Validated algorithms on popular RL benchmarks

Abstract

We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish $O (1/ m)$ mean-squared error bounds for our estimators, where $m$ is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research