Loading paper
Flatter, faster: scaling momentum for optimal speedup of SGD | Tomesphere