Loading paper
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate | Tomesphere