Loading paper
Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization | Tomesphere