Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients
Parisa Davar, Fr\'ed\'eric Godin, Jose Garrido

TL;DR
This paper introduces POTPG, a policy gradient method leveraging extreme value theory to effectively mitigate low-frequency, high-severity catastrophic risks in sequential decision making, with demonstrated success in financial risk management.
Contribution
The paper presents a novel policy gradient algorithm, POTPG, that incorporates extreme value theory to better handle tail risks in reinforcement learning.
Findings
POTPG outperforms standard benchmarks in risk mitigation.
The method effectively models tail risks with limited data.
Application to financial hedging demonstrates practical utility.
Abstract
This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
