Policy Gradient Methods for Distortion Risk Measures
Nithia Vijayan, Prashanth L.A

TL;DR
This paper introduces policy gradient algorithms tailored for risk-sensitive reinforcement learning, optimizing distortion risk measures to account for risk preferences in decision-making.
Contribution
It develops a novel policy gradient theorem for DRM and provides convergence guarantees for both on-policy and off-policy algorithms.
Findings
Algorithms effectively optimize DRM in RL settings
Convergence bounds are established for the proposed methods
Risk-sensitive policies outperform risk-neutral ones in experiments
Abstract
We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respectively. We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectric Power System Optimization
