Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions
Shanyu Han, Yang Liu, Xiang Yu

TL;DR
This paper introduces a risk-sensitive reinforcement learning framework using convex scoring functions, addressing time-inconsistency with an augmented state space and proposing a convergent Actor-Critic algorithm validated in financial simulations.
Contribution
It develops a novel RL approach for broad risk measures, overcoming time-inconsistency and providing theoretical guarantees without requiring continuous MDPs.
Findings
Effective in financial trading simulations
Handles a wide range of risk measures
Convergent auxiliary variable sampling method
Abstract
We propose a reinforcement learning (RL) framework under a broad class of risk objectives, characterized by convex scoring functions. This class covers many common risk measures, such as variance, Expected Shortfall, entropic Value-at-Risk, and mean-risk utility. To resolve the time-inconsistency issue, we consider an augmented state space and an auxiliary variable and recast the problem as a two-state optimization problem. We propose a customized Actor-Critic algorithm and establish some theoretical approximation guarantees. A key theoretical contribution is that our results do not require the Markov decision process to be continuous. Additionally, we propose an auxiliary variable sampling method inspired by the alternating minimization algorithm, which is convergent under certain conditions. We validate our approach in simulation experiments with a financial application in statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control
