An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate
Sayar Karmakar, Anirbit Mukherjee

TL;DR
This paper experimentally investigates the heavy-tailed distributions in stochastic gradient descent when training a ReLU gate, revealing unique properties and conjecturing similarities across algorithms.
Contribution
It provides the first experimental analysis of heavy-tail indices in ReLU gate training and compares behaviors of different algorithms in this context.
Findings
Heavy-tail index varies with input dimension, batch size, and step size.
Algorithms exhibit similar heavy-tail behavior in convergent scenarios.
Distinct heavy-tail properties observed compared to linear models and large neural networks.
Abstract
A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
