Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence
Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli

TL;DR
This paper provides a theoretical analysis of the stochastic heavy ball method (SHB) for neural networks, demonstrating its stability, connectivity, and convergence properties using a mean-field approach, which extends understanding beyond standard SGD.
Contribution
It introduces a mean-field framework for analyzing SHB with momentum, establishing global convergence, dropout stability, and connectivity in neural networks.
Findings
Proves existence and uniqueness of mean-field limit differential equations for SHB.
Shows convergence of SHB to the global optimum in neural networks.
Establishes dropout stability and connectivity of SHB solutions.
Abstract
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
MethodsStochastic Gradient Descent
