Soft Deterministic Policy Gradient with Gaussian Smoothing
Hyunjun Na, Donghwan Lee

TL;DR
This paper introduces Soft DDPG, a new reinforcement learning algorithm that uses Gaussian smoothing to address issues with traditional deterministic policy gradients in non-smooth reward environments.
Contribution
It proposes a novel smoothed Bellman equation and a soft deterministic policy gradient method that are robust to non-smooth critic functions.
Findings
Soft DDPG remains competitive in dense-reward tasks.
It outperforms standard DDPG in discretized-reward environments.
The method ensures well-defined gradients even with non-smooth Q-functions.
Abstract
Deterministic policy gradient (DPG) is widely utilized for continuous control; however, it inherently relies on the differentiability of the critic with respect to the action during policy updates. This assumption is violated in practical control problems involving sparse or discrete rewards, leading to ill-defined policy gradients and unstable learning. To address these challenges, we propose a principled alternative based on a smoothed Bellman equation formulated via Gaussian smoothing. Specifically, we define a novel action-value function based on a smoothed Bellman equation and derive the soft deterministic policy gradient (Soft-DPG). Our formulation eliminates explicit dependence on critic action-gradients and ensures that the gradient remains well-defined even for non-smooth Q-functions. We instantiate this framework into a deep reinforcement learning algorithm, which we call soft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
