Policy Gradient with Second Order Momentum
Tianyu Sun

TL;DR
This paper introduces PG-SOM, a second-order optimization method for reinforcement learning that improves sample efficiency and stability by using a diagonal Hessian approximation to precondition policy gradients.
Contribution
The paper presents PG-SOM, a novel lightweight second-order optimizer for policy gradients that leverages a diagonal Hessian estimate to enhance learning performance.
Findings
Up to 2.1x increase in sample efficiency.
Significant reduction in variance compared to baselines.
Method incurs only D memory overhead for D-parameter policies.
Abstract
We develop Policy Gradient with Second-Order Momentum (PG-SOM), a lightweight second-order optimisation scheme for reinforcement-learning policies. PG-SOM augments the classical REINFORCE update with two exponentially weighted statistics: a first-order gradient average and a diagonal approximation of the Hessian. By preconditioning the gradient with this curvature estimate, the method adaptively rescales each parameter, yielding faster and more stable ascent of the expected return. We provide a concise derivation, establish that the diagonal Hessian estimator is unbiased and positive-definite under mild regularity assumptions, and prove that the resulting update is a descent direction in expectation. Numerical experiments on standard control benchmarks show up to a 2.1x increase in sample efficiency and a substantial reduction in variance compared to first-order and Fisher-matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
MethodsREINFORCE
