Quasi-Newton Iteration in Deterministic Policy Gradient
Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Wenqi Cai,, Sebastien Gros

TL;DR
This paper introduces a model-free Hessian approximation for deterministic policy gradients in reinforcement learning, enabling superlinear convergence and unifying natural policy gradient as a special case.
Contribution
It proposes a novel quasi-Newton method for deterministic policy optimization that converges faster and generalizes the natural policy gradient approach.
Findings
Hessian approximation converges to the true Hessian at the optimal policy.
The method achieves superlinear convergence with rich policy parametrization.
Comparison shows improved convergence over natural policy gradient in nonlinear cases.
Abstract
This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Machine Learning and ELM
