Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling
Manojlo Vukovic, Dusan Jakovetic

TL;DR
This paper introduces R-SGD-Mini, a robust stochastic gradient method using medoid mini-batch sampling to handle heavy-tailed noise, with proven convergence rates and favorable experimental performance.
Contribution
The paper proposes a novel medoid-based mini-batch gradient sampling method for heavy-tailed noise, providing explicit convergence bounds and high-probability guarantees.
Findings
R-SGD-Mini converges at rate O(T^{-1}) in expectation.
The method achieves a rate of O(T^{-1/2}) when the time horizon is known.
Experimental results favor R-SGD-Mini over traditional methods.
Abstract
We consider a first order stochastic optimization framework where, at each iteration, independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the -sized data batch into distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
