Policy Gradient Methods for Distortion Risk Measures

Nithia Vijayan; Prashanth L.A

arXiv:2107.04422·cs.LG·February 6, 2024

Policy Gradient Methods for Distortion Risk Measures

Nithia Vijayan, Prashanth L.A

PDF

Open Access

TL;DR

This paper introduces policy gradient algorithms tailored for risk-sensitive reinforcement learning, optimizing distortion risk measures to account for risk preferences in decision-making.

Contribution

It develops a novel policy gradient theorem for DRM and provides convergence guarantees for both on-policy and off-policy algorithms.

Findings

01

Algorithms effectively optimize DRM in RL settings

02

Convergence bounds are established for the proposed methods

03

Risk-sensitive policies outperform risk-neutral ones in experiments

Abstract

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respectively. We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectric Power System Optimization