A policy gradient approach for optimization of smooth risk measures

Nithia Vijayan; Prashanth L.A

arXiv:2202.11046·cs.LG·June 25, 2024

A policy gradient approach for optimization of smooth risk measures

Nithia Vijayan, Prashanth L.A

PDF

Open Access

TL;DR

This paper introduces policy gradient algorithms tailored for optimizing smooth risk measures in reinforcement learning, applicable in both on-policy and off-policy scenarios, with theoretical convergence guarantees.

Contribution

It develops two novel policy gradient algorithms for smooth risk measures in RL, extending to mean-variance and distortion risk measures with convergence analysis.

Findings

01

Algorithms converge to stationary points at quantifiable rates

02

Applicable to mean-variance and distortion risk measures

03

Provides non-asymptotic convergence bounds

Abstract

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques