On Tuning Neural ODE for Stability, Consistency and Faster Convergence
Sheikh Waqas Akhtar

TL;DR
This paper introduces a Nesterov's accelerated gradient-based ODE-solver for Neural-ODEs, improving stability, convergence, and training speed, with demonstrated benefits across various tasks.
Contribution
It proposes a novel NAG-based ODE-solver tuned for stability, consistency, and faster convergence, addressing limitations of existing solvers in Neural-ODE models.
Findings
Faster training times compared to traditional solvers
Achieves better or comparable performance across tasks
Improves stability and convergence in Neural-ODEs
Abstract
Neural-ODE parameterize a differential equation using continuous depth neural network and solve it using numerical ODE-integrator. These models offer a constant memory cost compared to models with discrete sequence of hidden layers in which memory cost increases linearly with the number of layers. In addition to memory efficiency, other benefits of neural-ode include adaptability of evaluation approach to input, and flexibility to choose numerical precision or fast training. However, despite having all these benefits, it still has some limitations. We identify the ODE-integrator (also called ODE-solver) as the weakest link in the chain as it may have stability, consistency and convergence (CCS) issues and may suffer from slower convergence or may not converge at all. We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Global Average Pooling · Kaiming Initialization · Residual Block · Residual Connection · Convolution
