MLPGradientFlow: going with the flow of multilayer perceptrons (and   finding minima fast and accurately)

Johanni Brea; Flavio Martinelli; Berfin \c{S}im\c{s}ek; Wulfram; Gerstner

arXiv:2301.10638·cs.LG·January 26, 2023

MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)

Johanni Brea, Flavio Martinelli, Berfin \c{S}im\c{s}ek, Wulfram, Gerstner

PDF

Open Access 2 Repos

TL;DR

MLPGradientFlow introduces a software package that numerically solves gradient flow equations for multilayer perceptrons, demonstrating superior accuracy and speed over traditional optimizers like Adam, especially with Newton's method and BFGS.

Contribution

The paper presents a new software package for solving gradient flow equations in neural networks, showing improved convergence and efficiency over standard optimization methods.

Findings

01

Adaptive Runge-Kutta methods outperform gradient descent in accuracy and speed.

02

Newton's method and BFGS are more effective for finding minima.

03

Gradients and Hessians are computed faster than in PyTorch for small networks.

Abstract

MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot{θ} = - \nabla L (θ; D)$ , where $θ$ are the parameters of a multi-layer perceptron, $D$ is some data set, and $\nabla L$ is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton's method and approximations like BFGS preferable to find fixed points (local and global minima of $L$ ) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least $5 \times$ faster. Additionally, the package features an integrator for a teacher-student setup with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsAdam · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings