Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal?
Weijie Su

TL;DR
This paper introduces an isotropic curvature model to analyze deep learning optimization, revealing that gradient orthogonalization is directionally beneficial but not necessarily optimal, guiding future optimizer design.
Contribution
The paper develops a convex isotropic curvature model for understanding weight updates and analyzes the optimality of gradient orthogonalization in deep learning optimization.
Findings
Optimal update spectrum makes gradient singular values more homogeneous.
Gradient orthogonalization is directionally correct but not strictly optimal.
Model provides insights for designing new deep learning optimizers.
Abstract
In this paper, we introduce a model for analyzing deep learning optimization over a single iteration by leveraging the matrix structure of the weights. We derive the model by assuming isotropy of curvature, including the second-order Hessian and higher-order terms, of the loss function across all perturbation directions; hence, we call it the isotropic curvature model. This model is a convex optimization program amenable to analysis, which allows us to understand how an update on the weights in the form of a matrix relates to the change in the total loss function. As an application, we use the isotropic curvature model to analyze the recently introduced Muon optimizer and other matrix-gradient methods for training language models. First, we show that under a general growth condition on the curvature, the optimal update matrix is obtained by making the spectrum of the original gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Machine Learning and Data Classification
