Loading paper
Why is parameter averaging beneficial in SGD? An objective smoothing perspective | Tomesphere