Loading paper
Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective | Tomesphere