Decoupling Variance and Scale-Invariant Updates in Adaptive Gradient Descent for Unified Vector and Matrix Optimization
Zitao Song, Cedar Site Bai, Zhe Zhang, Brian Bullins, David F. Gleich

TL;DR
This paper introduces DeVA, a framework that decouples variance and scale-invariant updates in adaptive gradient methods, enabling improved optimization for both vector and matrix problems, with demonstrated empirical benefits.
Contribution
The paper presents DeVA, a novel approach that bridges vector and matrix adaptive optimization by decoupling variance and scale-invariant updates, improving convergence and performance.
Findings
DeVA outperforms state-of-the-art methods like Muon and SOAP.
DeVA reduces token usage by approximately 6.6%.
Variance adaptation improves blockwise smoothness and convergence speed.
Abstract
Adaptive methods like Adam have become the standard for large-scale vector and Euclidean optimization due to their coordinate-wise adaptation with a second-order nature. More recently, matrix-based spectral optimizers like Muon (Jordan et al., 2024b) show the power of treating weight matrices as matrices rather than long vectors. Linking these is hard because many natural generalizations are not feasible to implement, and we also cannot simply move the Adam adaptation to the matrix spectrum. To address this, we reformulate the AdaGrad update and decompose it into a variance adaptation term and a scale-invariant term. This decoupling produces (coupled ariance daptation), a framework that bridges between vector-based variance adaptation and matrix spectral optimization, enabling a seamless transition from Adam to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Advanced NMR Techniques and Applications
