Correcting Auto-Differentiation in Neural-ODE Training
Yewei Xu, Shi Chen, and Qin Li

TL;DR
This paper investigates the inaccuracies in auto-differentiation when training neural ODEs with high-order methods, proposing post-processing techniques to correct gradient oscillations and improve convergence.
Contribution
It identifies the problem of artificial oscillations in gradients caused by auto-differentiation with high-order methods and offers simple correction techniques.
Findings
Auto-differentiation can introduce artificial oscillations in gradients for high-order methods.
Post-processing techniques effectively eliminate oscillations and correct gradients.
Corrected gradients lead to better convergence in neural ODE training.
Abstract
Does the use of auto-differentiation yield reasonable updates for deep neural networks (DNNs)? Specifically, when DNNs are designed to adhere to neural ODE architectures, can we trust the gradients provided by auto-differentiation? Through mathematical analysis and numerical evidence, we demonstrate that when neural networks employ high-order methods, such as Linear Multistep Methods (LMM) or Explicit Runge-Kutta Methods (ERK), to approximate the underlying ODE flows, brute-force auto-differentiation often introduces artificial oscillations in the gradients that prevent convergence. In the case of Leapfrog and 2-stage ERK, we propose simple post-processing techniques that effectively eliminates these oscillations, correct the gradient computation and thus returns the accurate updates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
