PRISM: Distribution-free Adaptive Computation of Matrix Functions for Accelerating Neural Network Training
Shenghao Yang, Zhichao Wang, Oleg Balabanov, N. Benjamin Erichson, Michael W. Mahoney

TL;DR
PRISM is a novel framework that accelerates matrix function computations in neural network training by combining adaptive polynomial approximation with randomized sketching, eliminating the need for spectral bounds and improving efficiency.
Contribution
PRISM introduces a general, adaptive, and spectrum-agnostic method for accelerating matrix function computations in neural network optimization.
Findings
PRISM accelerates training when integrated with Shampoo and Muon optimizers.
It requires no explicit spectral bounds or singular value estimates.
Empirical results show significant speedups in neural network training.
Abstract
Matrix functions such as square root, inverse roots, and orthogonalization play a central role in preconditioned gradient methods for neural network training. This has motivated the development of iterative algorithms that avoid explicit eigendecompositions and rely primarily on matrix multiplications, making them well suited for modern GPU accelerators. We present PRISM (Polynomial-fitting and Randomized Iterative Sketching for Matrix functions computation), a general framework for accelerating iterative algorithms for computing matrix functions. PRISM combines adaptive polynomial approximation with randomized sketching: at each iteration, it fits a polynomial surrogate to the current spectrum via a sketched least-squares problem, adapting to the instance at hand with minimal overhead. We apply PRISM to accelerate Newton-Schulz-like iterations for matrix square roots and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Machine Learning in Materials Science
