Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters

Alexander Yukhimchuk; Mladen Kolar; Martin Tak\'a\v{c}; Sayantan Choudhury

arXiv:2605.11838·cs.LG·May 13, 2026

Gradient Clipping Beyond Vector Norms: A Spectral Approach for Matrix-Valued Parameters

Alexander Yukhimchuk, Mladen Kolar, Martin Tak\'a\v{c}, Sayantan Choudhury

PDF

TL;DR

This paper introduces spectral clipping, a novel gradient clipping method that stabilizes training by controlling singular values of gradient matrices, improving robustness against heavy-tailed noise.

Contribution

It generalizes classical norm clipping to matrix-valued parameters, providing a spectral approach with convergence guarantees and efficient implementation for neural network training.

Findings

01

Spectral clipping stabilizes training by controlling dominant singular values.

02

Layer-wise adaptive thresholds reduce hyperparameter tuning.

03

Efficient randomized SVD implementation enables scalable spectral clipping.

Abstract

Gradient clipping is a standard safeguard for training neural networks under noisy, heavy-tailed stochastic gradients; yet, most clipping rules treat all parameters as vectors and ignore the matrix structure of modern architectures. We show empirically that data outliers often amplify only a small number of leading singular values in layer-wise gradient matrices, while the rest of the spectrum remains largely unchanged. Motivated by this phenomenon, we propose spectral clipping, which stabilizes training by clamping singular values that exceed a threshold while preserving the singular directions. This framework generalizes classical gradient norm clipping and can be easily integrated into existing optimizers. We provide a convergence analysis for non-convex optimization with spectrally clipped SGD, yielding the optimal $O (K^{\frac{2 - 2 α}{3 α - 2}})$ rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.