Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition
Sanjeev Manivannan, Shuban V

TL;DR
This paper introduces a second-order actor-critic method for reinforcement learning in discounted MDPs, utilizing Hessian-vector products within a two-timescale framework to improve convergence stability and efficiency.
Contribution
It develops a novel second-order actor-critic algorithm that effectively incorporates curvature information via Hessian-vector products, addressing computational challenges in RL.
Findings
The method accelerates convergence compared to first-order approaches.
Hessian-vector product computations enable efficient second-order updates.
The two-timescale framework justifies the local constancy approximation of the value function.
Abstract
We address the discounted reward setting in reinforcement learning (RL). To mitigate the value approximation challenges in policy gradient methods, actor-critic approaches have been developed and are known to converge to stationary points under suitable assumptions. However, these methods rely on first-order updates. In contrast, second-order optimization provides principled curvature-aware updates that are proven to accelerate convergence, but its application in RL is limited by the computational complexity of Hessian estimation. In this work, we analyze second-order approximations for the actor update that leverage the full curvature information of the objective as much as possible. A stable approximation requires treating the action-value function as locally constant with respect to policy parameters, which does not generally hold in policy gradient methods. We show that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
