Long-Run Conditional Value-at-Risk Reinforcement Learning

Qixin Wang; Hao Cao; Jian-Qiang Hu; Mingjie Hu; Li Xia

arXiv:2603.09734·math.OC·March 11, 2026

Long-Run Conditional Value-at-Risk Reinforcement Learning

Qixin Wang, Hao Cao, Jian-Qiang Hu, Mingjie Hu, Li Xia

PDF

Open Access

TL;DR

This paper introduces a model-free reinforcement learning algorithm for long-run CVaR optimization in MDPs, providing convergence guarantees and extending to mean-CVaR problems, with numerical validation.

Contribution

It develops a novel CVaR-specific RL algorithm with convergence analysis and extends it to mean-CVaR optimization, addressing practical model-free scenarios.

Findings

01

Algorithm converges almost surely under technical conditions.

02

Convergence rate is of order O(1/n).

03

Numerical experiments validate theoretical results.

Abstract

Conditional value-at-risk (CVaR) is a prominent risk measure in financial engineering, energy systems, and supply chain management. In these domains, Markov decision processes (MDPs) with a long-run CVaR criterion effectively mitigate cost variability over a specified horizon. However, implementing MDPs relies on known transition models, which are typically unavailable in practice. This necessitates a model-free approach to risk-sensitive dynamic optimization. To tackle this challenge, we propose a reinforcement learning algorithm that simultaneously conducts policy evaluation and improvement based on a CVaR-specific Bellman local optimality equation. This algorithm employs a nonparametric incremental learning approach for policy improvement, relying on a single sample trajectory to identify the optimal policy. Under appropriate technical conditions, we prove almost sure convergence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Supply Chain and Inventory Management · Reinforcement Learning in Robotics