Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk
Sumedh Gupte, Shrey Rakeshkumar Patel, Soumen Pachal, Prashanth L. A., Sanjay P. Bhat

TL;DR
This paper develops risk-sensitive reinforcement learning algorithms for expectiles, shortfall risk, and optimized certainty equivalent risk, deriving policy gradient theorems, estimators, and convergence bounds, validated through experiments.
Contribution
It introduces novel risk-sensitive policy gradient algorithms for three risk measures, with theoretical convergence guarantees and empirical validation.
Findings
Established $ ext{O}(1/m)$ mean-squared error bounds for estimators
Proved smoothness and convergence rates of risk-sensitive objectives
Validated algorithms on popular RL benchmarks
Abstract
We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent risk. For each risk measure, in the context of a finite horizon Markov decision process, we first derive a policy gradient theorem. Second, we propose estimators of the risk-sensitive policy gradient for each of the aforementioned risk measures, and establish mean-squared error bounds for our estimators, where is the number of trajectories. Further, under standard assumptions for policy gradient-type algorithms, we establish smoothness of the risk-sensitive objective, in turn leading to stationary convergence rate bounds for the overall risk-sensitive policy gradient algorithm that we propose. Finally, we conduct numerical experiments to validate the theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research
