ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs
Amir Gholami, Kurt Keutzer, George Biros

TL;DR
ANODE introduces an adjoint-based neural ODE framework that provides unconditionally accurate gradients and reduces memory costs, addressing stability issues of previous methods in training neural ODEs.
Contribution
The paper proposes ANODE, a novel adjoint-based method for neural ODEs that ensures numerical stability and gradient accuracy while maintaining low memory usage.
Findings
ANODE achieves stable training with unconditionally accurate gradients.
Memory footprint is reduced to O(L) + O(N_t) with comparable computational cost.
Results on CIFAR datasets demonstrate effective training of ResNet and SqueezeNext architectures.
Abstract
Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8], claimed that this memory overhead can be reduced from O(LN_t), where N_t is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Numerical Analysis Techniques
MethodsSoftmax · Xavier Initialization · Dense Connections · Spatially Separable Convolution · SqueezeNeXt Block · SqueezeNeXt · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization
