Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic
Jeong Woon Lee, Kyoleen Kwak, Daeho Kim, and Hyoseok Hwang

TL;DR
This paper introduces a critic-centric regularization method called PAVE that stabilizes the Q-gradient field to produce smoother, more robust policies in actor-critic algorithms without altering the actor directly.
Contribution
The work provides a theoretical foundation linking policy smoothness to the critic’s differential geometry and proposes PAVE, a novel regularization framework focusing on the critic to enhance policy stability.
Findings
PAVE reduces Q-gradient volatility effectively.
PAVE achieves smooth policies comparable to existing methods.
PAVE maintains competitive task performance without actor modifications.
Abstract
Policies learned via continuous actor-critic methods often exhibit erratic, high-frequency oscillations, making them unsuitable for physical deployment. Current approaches attempt to enforce smoothness by directly regularizing the policy's output. We argue that this approach treats the symptom rather than the cause. In this work, we theoretically establish that policy non-smoothness is fundamentally governed by the differential geometry of the critic. By applying implicit differentiation to the actor-critic objective, we prove that the sensitivity of the optimal policy is bounded by the ratio of the Q-function's mixed-partial derivative (noise sensitivity) to its action-space curvature (signal distinctness). To empirically validate this theoretical insight, we introduce PAVE (Policy-Aware Value-field Equalization), a critic-centric regularization framework that treats the critic as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
