The Vanishing Gradient Problem for Stiff Neural Differential Equations
Colby Fronk, Linda Petzold

TL;DR
This paper demonstrates that the vanishing gradient problem in neural differential equations is a universal issue inherent to all A-stable and L-stable numerical integration schemes, especially in stiff systems, hindering training and parameter estimation.
Contribution
It provides a theoretical analysis showing the fundamental limitation of all A-stable methods in preserving gradients in stiff neural ODEs, supported by explicit formulas and rigorous proofs.
Findings
Vanishing gradients occur universally in stiff neural ODEs with A-stable schemes.
The decay rate of sensitivities is fundamentally limited to O(|z|^{-1}).
All A-stable methods suppress parameter gradients in stiff regimes.
Abstract
Gradient-based optimization of neural differential equations and other parameterized dynamical systems fundamentally relies on the ability to differentiate numerical solutions with respect to model parameters. In stiff systems, it has been observed that sensitivities to parameters controlling fast-decaying modes become vanishingly small during training, leading to optimization difficulties. In this paper, we show that this vanishing gradient phenomenon is not an artifact of any particular method, but a universal feature of all A-stable and L-stable stiff numerical integration schemes. We analyze the rational stability function for general stiff integration schemes and demonstrate that the relevant parameter sensitivities, governed by the derivative of the stability function, decay to zero for large stiffness. Explicit formulas for common stiff integration schemes are provided, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods for differential equations · Control and Stability of Dynamical Systems · Model Reduction and Neural Networks
