On Tuning Neural ODE for Stability, Consistency and Faster Convergence

Sheikh Waqas Akhtar

arXiv:2312.01657·cs.LG·March 27, 2025·1 cites

On Tuning Neural ODE for Stability, Consistency and Faster Convergence

Sheikh Waqas Akhtar

PDF

Open Access

TL;DR

This paper introduces a Nesterov's accelerated gradient-based ODE-solver for Neural-ODEs, improving stability, convergence, and training speed, with demonstrated benefits across various tasks.

Contribution

It proposes a novel NAG-based ODE-solver tuned for stability, consistency, and faster convergence, addressing limitations of existing solvers in Neural-ODE models.

Findings

01

Faster training times compared to traditional solvers

02

Achieves better or comparable performance across tasks

03

Improves stability and convergence in Neural-ODEs

Abstract

Neural-ODE parameterize a differential equation using continuous depth neural network and solve it using numerical ODE-integrator. These models offer a constant memory cost compared to models with discrete sequence of hidden layers in which memory cost increases linearly with the number of layers. In addition to memory efficiency, other benefits of neural-ode include adaptability of evaluation approach to input, and flexibility to choose numerical precision or fast training. However, despite having all these benefits, it still has some limitations. We identify the ODE-integrator (also called ODE-solver) as the weakest link in the chain as it may have stability, consistency and convergence (CCS) issues and may suffer from slower convergence or may not converge at all. We propose a first-order Nesterov's accelerated gradient (NAG) based ODE-solver which is proven to be tuned vis-a-vis CCS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Average Pooling · Global Average Pooling · Kaiming Initialization · Residual Block · Residual Connection · Convolution