Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity

Aneri Muni; Vincent Taboga; Esther Derman; Pierre-Luc Bacon; Erick Delage

arXiv:2602.03778·cs.LG·February 4, 2026

Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity

Aneri Muni, Vincent Taboga, Esther Derman, Pierre-Luc Bacon, Erick Delage

PDF

Open Access

TL;DR

This paper introduces a new Bellman operator-based approach for optimizing static CVaR in MDPs, enabling dense rewards and convergence guarantees, with algorithms that effectively learn risk-sensitive policies.

Contribution

It proposes a novel augmented formulation of static CVaR that results in a Bellman operator with dense rewards and contraction properties, facilitating risk-averse reinforcement learning.

Findings

01

Algorithms successfully learn CVaR-sensitive policies.

02

Achieve effective performance-safety trade-offs.

03

Provide convergence guarantees and error bounds.

Abstract

Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastrophic events. Unlike risk-neutral objectives, the static CVaR of the return depends on entire trajectories without admitting a recursive Bellman decomposition in the underlying Markov decision process. A classical resolution relies on state augmentation with a continuous variable. However, unless restricted to a specialized class of admissible value functions, this formulation induces sparse rewards and degenerate fixed points. In this work, we propose a novel formulation of the static CVaR objective based on augmentation. Our alternative approach leads to a Bellman operator with: (1) dense per-step rewards; (2) contracting properties on the full space of bounded value functions. Building on this theoretical foundation, we develop risk-averse value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Formal Methods in Verification