Loading paper
Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks | Tomesphere