Robust Stochastic Optimization via Gradient Quantile Clipping
Ibrahim Merad, St\'ephane Ga\"iffas

TL;DR
This paper presents a novel gradient clipping method using quantiles to enhance the robustness of stochastic gradient descent, effectively handling heavy-tailed data, outliers, and providing strong theoretical guarantees for convergence.
Contribution
Introduces a quantile-based gradient clipping strategy for SGD that improves robustness and efficiency, with rigorous theoretical analysis and practical implementation.
Findings
Proves robustness of the method against heavy-tailed distributions and outliers.
Establishes convergence guarantees for both convex and non-convex objectives.
Demonstrates strong empirical performance through numerical experiments.
Abstract
We introduce a clipping strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clipping thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber contamination. Our mathematical analysis leverages the connection between constant step size SGD and Markov chains and handles the bias introduced by clipping in an original way. For strongly convex objectives, we prove that the iteration converges to a concentrated distribution and derive high probability bounds on the final estimation error. In the non-convex case, we prove that the limit distribution is localized on a neighborhood with low gradient. We propose an implementation of this algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Markov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
