Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters
Alexander Yukhimchuk, Mladen Kolar, Martin Tak\'a\v{c}, Sayantan Choudhury

TL;DR
This paper introduces spectral clipping, a novel gradient clipping method that stabilizes training by controlling singular values of gradient matrices, improving robustness against heavy-tailed noise.
Contribution
It generalizes classical norm clipping to matrix-valued parameters, providing a spectral approach with convergence guarantees and efficient implementation for neural network training.
Findings
Spectral clipping stabilizes training by controlling dominant singular values.
Layer-wise adaptive thresholds reduce hyperparameter tuning.
Efficient randomized SVD implementation enables scalable spectral clipping.
Abstract
Gradient clipping is a standard safeguard for training neural networks under noisy, heavy-tailed stochastic gradients; yet, most clipping rules treat all parameters as vectors and ignore the matrix structure of modern architectures. We show empirically that data outliers often amplify only a small number of leading singular values in layer-wise gradient matrices, while the rest of the spectrum remains largely unchanged. Motivated by this phenomenon, we propose spectral clipping, which stabilizes training by clamping singular values that exceed a threshold while preserving the singular directions. This framework generalizes classical gradient norm clipping and can be easily integrated into existing optimizers. We provide a convergence analysis for non-convex optimization with spectrally clipped SGD, yielding the optimal rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
