Loading paper
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys | Tomesphere