SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
Dmitri Goloubentsev, Natalija Karpichina

TL;DR
SNAPO is a novel framework that embeds neural policies in differentiable simulators, enabling efficient gradient computation and sensitivity analysis for complex optimal control problems.
Contribution
It introduces a differentiable simulation-based policy optimization method that computes exact gradients and sensitivities efficiently using adjoint methods.
Findings
Training in under a minute for natural gas storage with full sensitivities.
Significant speedup in sensitivity computation for pension fund management.
Fast cross-unit sensitivities in pharmaceutical manufacturing with minimal computational cost.
Abstract
Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy Optimization), a framework that embeds a neural policy inside a known, differentiable simulator, replaces hard constraints with smooth approximations, and computes exact gradients of the objective with respect to all policy parameters and all inputs in a single adjoint pass. We demonstrate SNAPO on three domains: natural gas storage (training in under a minute, 365 forward curve sensitivities at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
