ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural   ODEs

Amir Gholami; Kurt Keutzer; George Biros

arXiv:1902.10298·cs.LG·July 2, 2019·79 cites

ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs

Amir Gholami, Kurt Keutzer, George Biros

PDF

Open Access 5 Repos

TL;DR

ANODE introduces an adjoint-based neural ODE framework that provides unconditionally accurate gradients and reduces memory costs, addressing stability issues of previous methods in training neural ODEs.

Contribution

The paper proposes ANODE, a novel adjoint-based method for neural ODEs that ensures numerical stability and gradient accuracy while maintaining low memory usage.

Findings

01

ANODE achieves stable training with unconditionally accurate gradients.

02

Memory footprint is reduced to O(L) + O(N_t) with comparable computational cost.

03

Results on CIFAR datasets demonstrate effective training of ResNet and SqueezeNext architectures.

Abstract

Residual neural networks can be viewed as the forward Euler discretization of an Ordinary Differential Equation (ODE) with a unit time step. This has recently motivated researchers to explore other discretization approaches and train ODE based networks. However, an important challenge of neural ODEs is their prohibitive memory cost during gradient backpropogation. Recently a method proposed in [8], claimed that this memory overhead can be reduced from O(LN_t), where N_t is the number of time steps, down to O(L) by solving forward ODE backwards in time, where L is the depth of the network. However, we will show that this approach may lead to several problems: (i) it may be numerically unstable for ReLU/non-ReLU activations and general convolution operators, and (ii) the proposed optimize-then-discretize approach may lead to divergent training due to inconsistent gradients for small time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Advanced Numerical Analysis Techniques

MethodsSoftmax · Xavier Initialization · Dense Connections · Spatially Separable Convolution · SqueezeNeXt Block · SqueezeNeXt · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization