Long-Run Conditional Value-at-Risk Reinforcement Learning
Qixin Wang, Hao Cao, Jian-Qiang Hu, Mingjie Hu, Li Xia

TL;DR
This paper introduces a model-free reinforcement learning algorithm for long-run CVaR optimization in MDPs, providing convergence guarantees and extending to mean-CVaR problems, with numerical validation.
Contribution
It develops a novel CVaR-specific RL algorithm with convergence analysis and extends it to mean-CVaR optimization, addressing practical model-free scenarios.
Findings
Algorithm converges almost surely under technical conditions.
Convergence rate is of order O(1/n).
Numerical experiments validate theoretical results.
Abstract
Conditional value-at-risk (CVaR) is a prominent risk measure in financial engineering, energy systems, and supply chain management. In these domains, Markov decision processes (MDPs) with a long-run CVaR criterion effectively mitigate cost variability over a specified horizon. However, implementing MDPs relies on known transition models, which are typically unavailable in practice. This necessitates a model-free approach to risk-sensitive dynamic optimization. To tackle this challenge, we propose a reinforcement learning algorithm that simultaneously conducts policy evaluation and improvement based on a CVaR-specific Bellman local optimality equation. This algorithm employs a nonparametric incremental learning approach for policy improvement, relying on a single sample trajectory to identify the optimal policy. Under appropriate technical conditions, we prove almost sure convergence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Supply Chain and Inventory Management · Reinforcement Learning in Robotics
