Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process
Xiangxin Zhou, Liang Wang, Yichi Zhou

TL;DR
This paper introduces a method to stabilize policy gradients in stochastic differential equations by enforcing consistency with the perturbation process, improving training stability and sample efficiency in high-dimensional generative models.
Contribution
We propose a novel constraint ensuring SDEs are consistent with their perturbation processes, enhancing policy gradient stability and applicability in complex generative tasks.
Findings
Achieved a Vina score of -9.07 on CrossDocked2020.
Improved stability and efficiency in training SDE-based generative models.
Enhanced performance in structure-based drug design tasks.
Abstract
Considering generating samples with high rewards, we focus on optimizing deep neural networks parameterized stochastic differential equations (SDEs), the advanced generative models with high expressiveness, with policy gradient, the leading algorithm in reinforcement learning. Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled. This challenge compromises the stability of policy gradients and negatively impacts sample complexity. To address these issues, we propose constraining the SDE to be consistent with its associated perturbation process. Since the perturbation process covers the entire space and is easy to sample, we can mitigate the aforementioned problems. Our framework offers a general approach allowing for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate Change Policy and Economics · Simulation Techniques and Applications · Stochastic processes and financial applications
MethodsSparse Evolutionary Training · Focus
