Robust Stochastic Optimization via Gradient Quantile Clipping

Ibrahim Merad; St\'ephane Ga\"iffas

arXiv:2309.17316·stat.ML·October 15, 2024

Robust Stochastic Optimization via Gradient Quantile Clipping

Ibrahim Merad, St\'ephane Ga\"iffas

PDF

Open Access

TL;DR

This paper presents a novel gradient clipping method using quantiles to enhance the robustness of stochastic gradient descent, effectively handling heavy-tailed data, outliers, and providing strong theoretical guarantees for convergence.

Contribution

Introduces a quantile-based gradient clipping strategy for SGD that improves robustness and efficiency, with rigorous theoretical analysis and practical implementation.

Findings

01

Proves robustness of the method against heavy-tailed distributions and outliers.

02

Establishes convergence guarantees for both convex and non-convex objectives.

03

Demonstrates strong empirical performance through numerical experiments.

Abstract

We introduce a clipping strategy for Stochastic Gradient Descent (SGD) which uses quantiles of the gradient norm as clipping thresholds. We prove that this new strategy provides a robust and efficient optimization algorithm for smooth objectives (convex or non-convex), that tolerates heavy-tailed samples (including infinite variance) and a fraction of outliers in the data stream akin to Huber contamination. Our mathematical analysis leverages the connection between constant step size SGD and Markov chains and handles the bias introduced by clipping in an original way. For strongly convex objectives, we prove that the iteration converges to a concentrated distribution and derive high probability bounds on the final estimation error. In the non-convex case, we prove that the limit distribution is localized on a neighborhood with low gradient. We propose an implementation of this algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Markov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent