Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

Sanjeev Manivannan; Shuban V

arXiv:2605.14982·cs.LG·May 15, 2026

Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

Sanjeev Manivannan, Shuban V

PDF

TL;DR

This paper introduces a second-order actor-critic method for reinforcement learning in discounted MDPs, utilizing Hessian-vector products within a two-timescale framework to improve convergence stability and efficiency.

Contribution

It develops a novel second-order actor-critic algorithm that effectively incorporates curvature information via Hessian-vector products, addressing computational challenges in RL.

Findings

01

The method accelerates convergence compared to first-order approaches.

02

Hessian-vector product computations enable efficient second-order updates.

03

The two-timescale framework justifies the local constancy approximation of the value function.

Abstract

We address the discounted reward setting in reinforcement learning (RL). To mitigate the value approximation challenges in policy gradient methods, actor-critic approaches have been developed and are known to converge to stationary points under suitable assumptions. However, these methods rely on first-order updates. In contrast, second-order optimization provides principled curvature-aware updates that are proven to accelerate convergence, but its application in RL is limited by the computational complexity of Hessian estimation. In this work, we analyze second-order approximations for the actor update that leverage the full curvature information of the objective as much as possible. A stable approximation requires treating the action-value function as locally constant with respect to policy parameters, which does not generally hold in policy gradient methods. We show that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.