Loading paper
Small nonlinearities in activation functions create bad local minima in neural networks | Tomesphere