An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU   Gate

Sayar Karmakar; Anirbit Mukherjee

arXiv:2204.12554·cs.LG·April 28, 2022

An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate

Sayar Karmakar, Anirbit Mukherjee

PDF

Open Access

TL;DR

This paper experimentally investigates the heavy-tailed distributions in stochastic gradient descent when training a ReLU gate, revealing unique properties and conjecturing similarities across algorithms.

Contribution

It provides the first experimental analysis of heavy-tail indices in ReLU gate training and compares behaviors of different algorithms in this context.

Findings

01

Heavy-tail index varies with input dimension, batch size, and step size.

02

Algorithms exhibit similar heavy-tail behavior in convergent scenarios.

03

Distinct heavy-tail properties observed compared to linear models and large neural networks.

Abstract

A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms