Stabilizing Policy Gradients for Stochastic Differential Equations via   Consistency with Perturbation Process

Xiangxin Zhou; Liang Wang; Yichi Zhou

arXiv:2403.04154·cs.LG·June 27, 2024·1 cites

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

Xiangxin Zhou, Liang Wang, Yichi Zhou

PDF

Open Access

TL;DR

This paper introduces a method to stabilize policy gradients in stochastic differential equations by enforcing consistency with the perturbation process, improving training stability and sample efficiency in high-dimensional generative models.

Contribution

We propose a novel constraint ensuring SDEs are consistent with their perturbation processes, enhancing policy gradient stability and applicability in complex generative tasks.

Findings

01

Achieved a Vina score of -9.07 on CrossDocked2020.

02

Improved stability and efficiency in training SDE-based generative models.

03

Enhanced performance in structure-based drug design tasks.

Abstract

Considering generating samples with high rewards, we focus on optimizing deep neural networks parameterized stochastic differential equations (SDEs), the advanced generative models with high expressiveness, with policy gradient, the leading algorithm in reinforcement learning. Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled. This challenge compromises the stability of policy gradients and negatively impacts sample complexity. To address these issues, we propose constraining the SDE to be consistent with its associated perturbation process. Since the perturbation process covers the entire space and is easy to sample, we can mitigate the aforementioned problems. Our framework offers a general approach allowing for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate Change Policy and Economics · Simulation Techniques and Applications · Stochastic processes and financial applications

MethodsSparse Evolutionary Training · Focus