A Variance-Reduced Cubic-Regularized Newton for Policy Optimization
Cheng Sun, Zhen Zhang, Shaofu Yang

TL;DR
This paper introduces VR-CR-PN, a novel second-order policy optimization algorithm in reinforcement learning that uses variance reduction and cubic regularization to improve sample complexity without relying on importance sampling.
Contribution
It is the first to combine Hessian-based variance reduction with second-order policy optimization, achieving horizon-independent sample complexity and improved theoretical guarantees.
Findings
Achieves $ ilde{O}( ext{ extcolor{red}{ ext{epsilon}}}^{-3})$ sample complexity for second-order stationary points.
Introduces a Hessian estimator with a horizon-independent upper bound.
Outperforms previous methods with $ ilde{O}( ext{ extcolor{red}{ ext{epsilon}}}^{-3.5})$ complexity.
Abstract
In this paper, we study a second-order approach to policy optimization in reinforcement learning. Existing second-order methods often suffer from suboptimal sample complexity or rely on unrealistic assumptions about importance sampling. To overcome these limitations, we propose VR-CR-PN, a variance-reduced cubic-regularized policy Newton algorithm. To the best of our knowledge, this is the first algorithm that integrates Hessian-aided variance reduction with second-order policy optimization, effectively addressing the distribution shift problem and achieving best-known sample complexity under general nonconvex conditions but without the need for importance sampling. We theoretically establish that VR-CR-PN achieves a sample complexity of to reach an -second-order stationary point, significantly improving upon the previous best result of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research · Advanced Control Systems Optimization
