Learning by solving differential equations
Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Sourabh Medapati, Javier Gonzalvo

TL;DR
This paper investigates the application of higher-order Runge-Kutta ODE solvers in deep learning, evaluating their performance, limitations, and potential improvements by integrating modern optimizer techniques.
Contribution
It is the first comprehensive study on applying high-order RK methods in deep learning, proposing enhancements with preconditioning, adaptive rates, and momentum.
Findings
Higher-order RK methods can improve training stability and accuracy.
Incorporating optimizer techniques enhances RK solver performance.
Limitations include computational cost and stability issues.
Abstract
Modern deep learning algorithms use variations of gradient descent as their main learning methods. Gradient descent can be understood as the simplest Ordinary Differential Equation (ODE) solver; namely, the Euler method applied to the gradient flow differential equation. Since Euler, many ODE solvers have been devised that follow the gradient flow equation more precisely and more stably. Runge-Kutta (RK) methods provide a family of very powerful explicit and implicit high-order ODE solvers. However, these higher-order solvers have not found wide application in deep learning so far. In this work, we evaluate the performance of higher-order RK solvers when applied in deep learning, study their limitations, and propose ways to overcome these drawbacks. In particular, we explore how to improve their performance by naturally incorporating key ingredients of modern neural network optimizers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Numerical methods for differential equations
