Loading paper
Improving Generalization Performance by Switching from Adam to SGD | Tomesphere