Enhancing LLM Training via Spectral Clipping
Xiaowen Jiang, Andrei Semenov, Sebastian U. Stich

TL;DR
This paper introduces SPECTRA, a spectral clipping framework for LLM training that improves stability and generalization by controlling spectral norms of updates and gradients, leading to better validation performance.
Contribution
SPECTRA is a novel spectral clipping framework that enforces spectral-norm constraints and suppresses spectral noise, improving LLM training stability and performance over standard optimizers.
Findings
SPECTRA improves validation loss across various optimizers.
Models trained with SPECTRA have smaller weight norms.
State-of-the-art results achieved with spectral clipping methods.
Abstract
While spectral-based optimizers like Muon operate directly on the spectrum of updates, standard adaptive methods such as AdamW do not account for the global spectral structure of weights and gradients, leaving them vulnerable to two empirical issues in large language model (LLM) training: (i) the optimizer updates can have large spectral norms, potentially destabilizing training and degrading generalization; (ii) stochastic gradient noise can exhibit sparse spectral spikes, with a few dominant singular values much larger than the rest. We propose SPECTRA, a general framework addressing these by (i) post-spectral clipping of updates to enforce spectral-norm constraints; (ii) optional pre-spectral clipping of gradients to suppress spectral noise spikes. We prove that post-clipping constitutes a Composite Frank-Wolfe method with spectral-norm constraints and weight regularization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Advanced Neural Network Applications
