Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE
Juntang Zhuang, Nicha Dvornek, Xiaoxiao Li, Sekhar Tatikonda, Xenophon, Papademetris, James Duncan

TL;DR
The paper introduces the Adaptive Checkpoint Adjoint (ACA) method for more accurate and efficient gradient estimation in Neural ODEs, improving performance on various tasks including image classification and physical modeling.
Contribution
ACA combines trajectory checkpointing, redundancy removal, and adaptive solvers to enhance gradient accuracy and training efficiency in Neural ODEs.
Findings
ACA reduces error rates by half compared to existing methods.
NODE trained with ACA surpasses ResNet in accuracy and reliability.
ACA improves performance in time-series modeling and physical simulations.
Abstract
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications
Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Residual Connection · Kaiming Initialization · Convolution
