A policy gradient approach for optimization of smooth risk measures
Nithia Vijayan, Prashanth L.A

TL;DR
This paper introduces policy gradient algorithms tailored for optimizing smooth risk measures in reinforcement learning, applicable in both on-policy and off-policy scenarios, with theoretical convergence guarantees.
Contribution
It develops two novel policy gradient algorithms for smooth risk measures in RL, extending to mean-variance and distortion risk measures with convergence analysis.
Findings
Algorithms converge to stationary points at quantifiable rates
Applicable to mean-variance and distortion risk measures
Provides non-asymptotic convergence bounds
Abstract
We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques
