A Variance-Reduced Cubic-Regularized Newton for Policy Optimization

Cheng Sun; Zhen Zhang; Shaofu Yang

arXiv:2507.10120·cs.LG·July 15, 2025

A Variance-Reduced Cubic-Regularized Newton for Policy Optimization

Cheng Sun, Zhen Zhang, Shaofu Yang

PDF

Open Access

TL;DR

This paper introduces VR-CR-PN, a novel second-order policy optimization algorithm in reinforcement learning that uses variance reduction and cubic regularization to improve sample complexity without relying on importance sampling.

Contribution

It is the first to combine Hessian-based variance reduction with second-order policy optimization, achieving horizon-independent sample complexity and improved theoretical guarantees.

Findings

01

Achieves $ ilde{O}( ext{ extcolor{red}{ ext{epsilon}}}^{-3})$ sample complexity for second-order stationary points.

02

Introduces a Hessian estimator with a horizon-independent upper bound.

03

Outperforms previous methods with $ ilde{O}( ext{ extcolor{red}{ ext{epsilon}}}^{-3.5})$ complexity.

Abstract

In this paper, we study a second-order approach to policy optimization in reinforcement learning. Existing second-order methods often suffer from suboptimal sample complexity or rely on unrealistic assumptions about importance sampling. To overcome these limitations, we propose VR-CR-PN, a variance-reduced cubic-regularized policy Newton algorithm. To the best of our knowledge, this is the first algorithm that integrates Hessian-aided variance reduction with second-order policy optimization, effectively addressing the distribution shift problem and achieving best-known sample complexity under general nonconvex conditions but without the need for importance sampling. We theoretically establish that VR-CR-PN achieves a sample complexity of $\tilde{O} (ϵ^{- 3})$ to reach an $ϵ$ -second-order stationary point, significantly improving upon the previous best result of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization