Multiplicative noise and heavy tails in stochastic optimization

Liam Hodgkinson; Michael W. Mahoney

arXiv:2006.06293·stat.ML·June 12, 2020·32 cites

Multiplicative noise and heavy tails in stochastic optimization

Liam Hodgkinson, Michael W. Mahoney

PDF

Open Access 1 Video

TL;DR

This paper investigates how multiplicative noise in stochastic optimization leads to heavy-tailed parameter distributions, affecting convergence and exploration in neural network training.

Contribution

It provides a theoretical framework linking multiplicative noise to heavy tails and demonstrates its impact across various models and optimizers, supported by empirical evidence.

Findings

01

Multiplicative noise causes heavy-tailed stationary distributions in parameters.

02

Heavy tails improve exploration and basin hopping in non-convex optimization.

03

Results hold for a wide range of models, optimizers, and real neural network training scenarios.

Abstract

Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in heavy-tailed stationary behaviour in the parameters. A detailed analysis is conducted for SGD applied to a simple linear regression problem, followed by theoretical results for a much larger class of models (including non-linear and non-convex) and optimizers (including momentum, Adam, and stochastic Newton), demonstrating that our qualitative results hold much more generally. In each case, we describe dependence on key factors, including step size, batch size, and data variability, all of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multiplicative Noise and Heavy Tails in Stochastic Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent · Adam · Linear Regression