Policy Gradient with Second Order Momentum

Tianyu Sun

arXiv:2505.11561·cs.LG·May 20, 2025

Policy Gradient with Second Order Momentum

Tianyu Sun

PDF

Open Access

TL;DR

This paper introduces PG-SOM, a second-order optimization method for reinforcement learning that improves sample efficiency and stability by using a diagonal Hessian approximation to precondition policy gradients.

Contribution

The paper presents PG-SOM, a novel lightweight second-order optimizer for policy gradients that leverages a diagonal Hessian estimate to enhance learning performance.

Findings

01

Up to 2.1x increase in sample efficiency.

02

Significant reduction in variance compared to baselines.

03

Method incurs only D memory overhead for D-parameter policies.

Abstract

We develop Policy Gradient with Second-Order Momentum (PG-SOM), a lightweight second-order optimisation scheme for reinforcement-learning policies. PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: a first-order gradient average and a diagonal approximation of the Hessian. By preconditioning the gradient with this curvature estimate, the method adaptively rescales each parameter, yielding faster and more stable ascent of the expected return. We provide a concise derivation, establish that the diagonal Hessian estimator is unbiased and positive-definite under mild regularity assumptions, and prove that the resulting update is a descent direction in expectation. Numerical experiments on standard control benchmarks show up to a 2.1x increase in sample efficiency and a substantial reduction in variance compared to first-order and Fisher-matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsREINFORCE