Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Arash Bahari Kordabad; Dean Brandner; Sebastien Gros; Sergio Lucia; and Sadegh Soudjani

arXiv:2511.09509·cs.LG·November 13, 2025

Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Arash Bahari Kordabad, Dean Brandner, Sebastien Gros, Sergio Lucia, and Sadegh Soudjani

PDF

Open Access

TL;DR

This paper introduces a second-order deterministic actor-critic method that leverages curvature information for faster convergence in reinforcement learning, applicable to any differentiable policy class.

Contribution

It develops a quadratic critic with compatible function approximation and a quasi-Newton actor update, enhancing convergence speed over first-order methods.

Findings

01

Achieves faster convergence than first-order methods.

02

Demonstrates improved performance in numerical experiments.

03

Applicable to any differentiable policy class.

Abstract

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Model Reduction and Neural Networks