Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic

Jeong Woon Lee; Kyoleen Kwak; Daeho Kim; and Hyoseok Hwang

arXiv:2601.22970·cs.LG·February 2, 2026

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic

Jeong Woon Lee, Kyoleen Kwak, Daeho Kim, and Hyoseok Hwang

PDF

Open Access

TL;DR

This paper introduces a critic-centric regularization method called PAVE that stabilizes the Q-gradient field to produce smoother, more robust policies in actor-critic algorithms without altering the actor directly.

Contribution

The work provides a theoretical foundation linking policy smoothness to the critic’s differential geometry and proposes PAVE, a novel regularization framework focusing on the critic to enhance policy stability.

Findings

01

PAVE reduces Q-gradient volatility effectively.

02

PAVE achieves smooth policies comparable to existing methods.

03

PAVE maintains competitive task performance without actor modifications.

Abstract

Policies learned via continuous actor-critic methods often exhibit erratic, high-frequency oscillations, making them unsuitable for physical deployment. Current approaches attempt to enforce smoothness by directly regularizing the policy's output. We argue that this approach treats the symptom rather than the cause. In this work, we theoretically establish that policy non-smoothness is fundamentally governed by the differential geometry of the critic. By applying implicit differentiation to the actor-critic objective, we prove that the sensitivity of the optimal policy is bounded by the ratio of the Q-function's mixed-partial derivative (noise sensitivity) to its action-space curvature (signal distinctness). To empirically validate this theoretical insight, we introduce PAVE (Policy-Aware Value-field Equalization), a critic-centric regularization framework that treats the critic as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning