Mollification Effects of Policy Gradient Methods
Tao Wang, Sylvia Herbert, Sicun Gao

TL;DR
This paper analyzes how policy gradient methods smooth non-smooth optimization landscapes in deep reinforcement learning, revealing both their benefits in optimization and their limitations due to deviation from the original problem.
Contribution
It introduces a rigorous framework linking policy gradients to mollification and heat equations, highlighting fundamental challenges and the impact of stochasticity in RL.
Findings
Policy gradients mollify non-smooth landscapes, aiding optimization.
Mollification causes deviation from the original objective, posing challenges.
Experimental results show both benefits and drawbacks of mollification in practice.
Abstract
Policy gradient methods have enabled deep reinforcement learning (RL) to approach challenging continuous control problems, even when the underlying systems involve highly nonlinear dynamics that generate complex non-smooth optimization landscapes. We develop a rigorous framework for understanding how policy gradient methods mollify non-smooth optimization landscapes to enable effective policy search, as well as the downside of it: while making the objective function smoother and easier to optimize, the stochastic objective deviates further from the original problem. We demonstrate the equivalence between policy gradient methods and solving backward heat equations. Following the ill-posedness of backward heat equations from PDE theory, we present a fundamental challenge to the use of policy gradient under stochasticity. Moreover, we make the connection between this limitation and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarine and fisheries research
