Mean-field analysis for heavy ball methods: Dropout-stability,   connectivity, and global convergence

Diyuan Wu; Vyacheslav Kungurtsev; Marco Mondelli

arXiv:2210.06819·cs.LG·February 7, 2023

Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the stochastic heavy ball method (SHB) for neural networks, demonstrating its stability, connectivity, and convergence properties using a mean-field approach, which extends understanding beyond standard SGD.

Contribution

It introduces a mean-field framework for analyzing SHB with momentum, establishing global convergence, dropout stability, and connectivity in neural networks.

Findings

01

Proves existence and uniqueness of mean-field limit differential equations for SHB.

02

Shows convergence of SHB to the global optimum in neural networks.

03

Establishes dropout stability and connectivity of SHB solutions.

Abstract

The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM

MethodsStochastic Gradient Descent