MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)
Johanni Brea, Flavio Martinelli, Berfin \c{S}im\c{s}ek, Wulfram, Gerstner

TL;DR
MLPGradientFlow introduces a software package that numerically solves gradient flow equations for multilayer perceptrons, demonstrating superior accuracy and speed over traditional optimizers like Adam, especially with Newton's method and BFGS.
Contribution
The paper presents a new software package for solving gradient flow equations in neural networks, showing improved convergence and efficiency over standard optimization methods.
Findings
Adaptive Runge-Kutta methods outperform gradient descent in accuracy and speed.
Newton's method and BFGS are more effective for finding minima.
Gradients and Hessians are computed faster than in PyTorch for small networks.
Abstract
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation , where are the parameters of a multi-layer perceptron, is some data set, and is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton's method and approximations like BFGS preferable to find fixed points (local and global minima of ) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least faster. Additionally, the package features an integrator for a teacher-student setup with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsAdam · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
