SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Dmitri Goloubentsev; Natalija Karpichina

arXiv:2605.06570·cs.LG·May 8, 2026

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Dmitri Goloubentsev, Natalija Karpichina

PDF

TL;DR

SNAPO is a novel framework that embeds neural policies in differentiable simulators, enabling efficient gradient computation and sensitivity analysis for complex optimal control problems.

Contribution

It introduces a differentiable simulation-based policy optimization method that computes exact gradients and sensitivities efficiently using adjoint methods.

Findings

01

Training in under a minute for natural gas storage with full sensitivities.

02

Significant speedup in sensitivity computation for pension fund management.

03

Fast cross-unit sensitivities in pharmaceutical manufacturing with minimal computational cost.

Abstract

Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy Optimization), a framework that embeds a neural policy inside a known, differentiable simulator, replaces hard constraints with smooth approximations, and computes exact gradients of the objective with respect to all policy parameters and all inputs in a single adjoint pass. We demonstrate SNAPO on three domains: natural gas storage (training in under a minute, 365 forward curve sensitivities at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.