Loading paper
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks | Tomesphere