Policy Newton methods for Distortion Riskmetrics

Soumen Pachal; Mizhaan Prajit Maniyar; Prashanth L.A

arXiv:2508.07249·cs.LG·August 12, 2025

Policy Newton methods for Distortion Riskmetrics

Soumen Pachal, Mizhaan Prajit Maniyar, Prashanth L.A

PDF

Open Access

TL;DR

This paper introduces a novel policy Newton method for risk-sensitive reinforcement learning that maximizes distortion riskmetrics, providing convergence guarantees to second-order stationary points and demonstrating effectiveness through experiments.

Contribution

It develops a new Hessian estimator for DRM objectives and proposes a cubic-regularized policy Newton algorithm with convergence guarantees to second-order stationary points.

Findings

01

Algorithm converges to an $oldsymbol{ ext{ extit{epsilon}}}$-second-order stationary point.

02

Sample complexity is $oldsymbol{ ext{O}( ext{ extit{epsilon}}^{-3.5})}$ for convergence.

03

Experiments validate theoretical convergence results.

Abstract

We consider the problem of risk-sensitive control in a reinforcement learning (RL) framework. In particular, we aim to find a risk-optimal policy by maximizing the distortion riskmetric (DRM) of the discounted reward in a finite horizon Markov decision process (MDP). DRMs are a rich class of risk measures that include several well-known risk measures as special cases. We derive a policy Hessian theorem for the DRM objective using the likelihood ratio method. Using this result, we propose a natural DRM Hessian estimator from sample trajectories of the underlying MDP. Next, we present a cubic-regularized policy Newton algorithm for solving this problem in an on-policy RL setting using estimates of the DRM gradient and Hessian. Our proposed algorithm is shown to converge to an $ϵ$ -second-order stationary point ( $ϵ$ -SOSP) of the DRM objective, and this guarantee ensures the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques