An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control

Mengjian Hua; Mathieu Lauri\`ere; Eric Vanden-Eijnden

arXiv:2410.05163·cs.LG·May 14, 2025

An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control

Mengjian Hua, Mathieu Lauri\`ere, Eric Vanden-Eijnden

PDF

Open Access

TL;DR

This paper introduces a new on-policy deep learning algorithm for stochastic optimal control that uses Girsanov theorem to improve computational speed and scalability, enabling efficient high-dimensional control policy optimization.

Contribution

The method uniquely leverages Girsanov theorem for direct on-policy gradient computation, avoiding complex backpropagation and adjoint solutions, thus enhancing efficiency and scalability.

Findings

01

Significant speedup over existing methods.

02

Improved memory efficiency in high-dimensional problems.

03

Successful application to sampling and diffusion models.

Abstract

We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schr\"odinger-F\"ollmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion